[Box-Admins] box3.squeak.org off line - HELP neeeded

Ken Causey ken at kencausey.com
Tue Oct 8 19:12:12 UTC 2013


The answer is yes but of course source.squeak.org is so so much smaller 
in terms of the data it has to handle and the traffic level that its 
swings in memory usage are rarely a significant problem.

I should have said something more when you started discussing this, and 
I apologize for my level of silence. I'm concerned that given the amount 
of trouble the original owners had with SqueakSource.com, I don't see 
how we can expect to do better, particularly with a virtual server with 
only 1GB of RAM allocated to it. Setting it to read-only, and I'm not 
sure but perhaps that is how it is set now, is of course going to reduce 
the load. But how much? It's not an easy question to answer.

Ken

On 10/08/2013 01:08 PM, David T. Lewis wrote:
> Aha. That would do it for sure. So something is going on in the squeaksource
> image that is using a *lot* of object memory for some period of time. The use
> of 957m resident memory would very likely be enough to cause the symptoms
> that we saw.
>
> Do you know if we see any similar pattern of memory usage on the source.squeak.org
> server? I'm already convinced that the squeaksource.com image badly needs
> to be updated to the same level of Squeak/Seaside/SqueakSource as our
> source.squeak.org server (due to socket leak problems if nothing else).
>
> I also recall that squeaksource.com on the SCG server had horrible performance
> problems whenever we tried to commit a large MCZ to the VMMaker repository,
> and I always assumed that it was memory related in some way on another.
>
> Thanks a lot Ken,
>
> Dave
>
>
>
> On Tue, Oct 08, 2013 at 12:37:17PM -0500, Ken Causey wrote:
>> oom_killer
>>
>>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>
>>
>> 12929 ssdotcom  20   0 1028m 957m  784 R 99.9 95.0  12:05.45
>> /usr/local/lib/squeak/4.10.5-2619/squeakvm -vm-display-null
>> squeaksource.2.image
>>
>>
>> Note the 6th field.
>>
>> Ken
>>
>> On 10/08/2013 12:22 PM, Frank Shearar wrote:
>>> On 8 October 2013 17:48, David T. Lewis<lewis at mail.msen.com>   wrote:
>>>> On Tue, Oct 08, 2013 at 04:33:28PM +0100, Frank Shearar wrote:
>>>>> On 6 October 2013 19:00, David T. Lewis<lewis at mail.msen.com>   wrote:
>>>>>> On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote:
>>>>>>> On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote:
>>>>>>>>
>>>>>>>> So, uptime said:
>>>>>>>> root at box3-squeak:/home/ssdotcom# uptime
>>>>>>>>   16:32:49 up 166 days, 22 min,  1 user,  load average: 0.70, 4.74,
>>>>>>>>   8.52
>>>>>>>>
>>>>>>>> And the _last two_ numbers are concerning. Basically, the server was
>>>>>>>> overloaded.
>>>>>>>> the Squeak vm uses about a gig of virtual memory (really?) and seems
>>>>>>>> to compete with
>>>>>>>> the jenkins running on the server. htop says, jenkins uses 25% of the
>>>>>>>> systems memory
>>>>>>>> while Squeak uses 19% (both of which I deem high).
>>>>>>>>    So in the event of some jenkins jobs firing off and Squeaksource
>>>>>>>>    answering some requests,
>>>>>>>> the server might become un-responsive?
>>>>>>>>
>>>>>>>
>>>>>>> Something like that I think. I'm not sure what was generating the
>>>>>>> load, although
>>>>>>> there is no question that adding squeaksource to box3 adds a
>>>>>>> significant resource
>>>>>>> demand above that of the Jenkins jobs.
>>>>>>>
>>>>>>> Allocating a big address space (1G) is normal for the VM, and in this
>>>>>>> case the
>>>>>>> image is actually using a bit under 200MB, which is 20% of the system
>>>>>>> memory.
>>>>>>> If there is some combination of squeaksource and jenkins activity that
>>>>>>> pushes
>>>>>>> the total demand to the point of requiring swapping, then it's
>>>>>>> possible that
>>>>>>> this would make the system unresponsive as I was seeing.
>>>>>>>
>>>>>>> A number of the Jenkins jobs run squeak VMs in addition to the Java
>>>>>>> stuff,
>>>>>>> so some combination of these might add up to a problem.
>>>>>>>
>>>>>>
>>>>>> I am now running top every 30 seconds for the next 24 hours, with
>>>>>> output directed
>>>>>> to ~ssdotcom/tmp/top.out. Possibly this will show us something
>>>>>> interesting.
>>>>>
>>>>> If that's still running, it's probably saying "ow! ow! stop it!" right
>>>>> now. If Tony Garnock-Jones&   I could figure out why jobs are failing
>>>>> on his slaves, I'd suggest moving builds off the box entirely. I'll
>>>>> probably turn my old laptop into a build slave... once I can get it up
>>>>> &   running again. That too will help with keeping work off the box.
>>>>>
>>>>
>>>> No worries, I only ran it for a 24 hour period. I saw occasional load
>>>> increases, but nothing like the "load average: 0.70, 4.74, 8.52" that
>>>> Tobias spotted right after the outage. I think we just need to keep
>>>> our eyes open for problems in case it comes back ... usually if a thing
>>>> can fail once, it will fail again eventually ;-)
>>>
>>> Hm, OK, so build.squeak.org could be unavailable for a different reason!
>>>
>>> frank
>>>
>>>> Dave
>>>>
>>>
>>>
>
>



More information about the Box-Admins mailing list