[Box-Admins] box3.squeak.org off line - HELP neeeded

Ken Causey ken at kencausey.com
Tue Oct 8 17:37:17 UTC 2013


oom_killer

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
 

12929 ssdotcom  20   0 1028m 957m  784 R 99.9 95.0  12:05.45 
/usr/local/lib/squeak/4.10.5-2619/squeakvm -vm-display-null 
squeaksource.2.image 


Note the 6th field.

Ken

On 10/08/2013 12:22 PM, Frank Shearar wrote:
> On 8 October 2013 17:48, David T. Lewis<lewis at mail.msen.com>  wrote:
>> On Tue, Oct 08, 2013 at 04:33:28PM +0100, Frank Shearar wrote:
>>> On 6 October 2013 19:00, David T. Lewis<lewis at mail.msen.com>  wrote:
>>>> On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote:
>>>>> On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote:
>>>>>>
>>>>>> So, uptime said:
>>>>>> root at box3-squeak:/home/ssdotcom# uptime
>>>>>>   16:32:49 up 166 days, 22 min,  1 user,  load average: 0.70, 4.74, 8.52
>>>>>>
>>>>>> And the _last two_ numbers are concerning. Basically, the server was overloaded.
>>>>>> the Squeak vm uses about a gig of virtual memory (really?) and seems to compete with
>>>>>> the jenkins running on the server. htop says, jenkins uses 25% of the systems memory
>>>>>> while Squeak uses 19% (both of which I deem high).
>>>>>>    So in the event of some jenkins jobs firing off and Squeaksource answering some requests,
>>>>>> the server might become un-responsive?
>>>>>>
>>>>>
>>>>> Something like that I think. I'm not sure what was generating the load, although
>>>>> there is no question that adding squeaksource to box3 adds a significant resource
>>>>> demand above that of the Jenkins jobs.
>>>>>
>>>>> Allocating a big address space (1G) is normal for the VM, and in this case the
>>>>> image is actually using a bit under 200MB, which is 20% of the system memory.
>>>>> If there is some combination of squeaksource and jenkins activity that pushes
>>>>> the total demand to the point of requiring swapping, then it's possible that
>>>>> this would make the system unresponsive as I was seeing.
>>>>>
>>>>> A number of the Jenkins jobs run squeak VMs in addition to the Java stuff,
>>>>> so some combination of these might add up to a problem.
>>>>>
>>>>
>>>> I am now running top every 30 seconds for the next 24 hours, with output directed
>>>> to ~ssdotcom/tmp/top.out. Possibly this will show us something interesting.
>>>
>>> If that's still running, it's probably saying "ow! ow! stop it!" right
>>> now. If Tony Garnock-Jones&  I could figure out why jobs are failing
>>> on his slaves, I'd suggest moving builds off the box entirely. I'll
>>> probably turn my old laptop into a build slave... once I can get it up
>>> &  running again. That too will help with keeping work off the box.
>>>
>>
>> No worries, I only ran it for a 24 hour period. I saw occasional load
>> increases, but nothing like the "load average: 0.70, 4.74, 8.52" that
>> Tobias spotted right after the outage. I think we just need to keep
>> our eyes open for problems in case it comes back ... usually if a thing
>> can fail once, it will fail again eventually ;-)
>
> Hm, OK, so build.squeak.org could be unavailable for a different reason!
>
> frank
>
>> Dave
>>
>
>



More information about the Box-Admins mailing list