[Box-Admins] box3.squeak.org off line - HELP neeeded

David T. Lewis lewis at mail.msen.com
Sun Oct 6 15:18:25 UTC 2013


On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote:
> Am 06.10.2013 um 16:39 schrieb "David T. Lewis" <lewis at mail.msen.com>:
> 
> > On Sun, Oct 06, 2013 at 04:31:05PM +0200, Tobias Pape wrote:
> >> Am 06.10.2013 um 16:15 schrieb "David T. Lewis" <lewis at mail.msen.com>:
> >> 
> >>> I cannot ping box3.squeak.org, which is our build.squeak.org and squeaksource.com
> >>> server. The box4.squeak.org box is responding, but box3 times out on a ping.
> >>> 
> >>> Can someone please check it and restart if needed? If rebooted, the squeaksource.com
> >>> service will require a manual restart as per ~ssdotcom/README.
> >>> 
> >>> I have limited connectivity today but will try to keep in touch.
> >>> 
> >>> A root cause analyis will be in order, and since squeaksource.com is the
> >>> most recent change on box3.squeak.org, that will be an obvious suspect.
> >>> 
> >> 
> >> Aparently, box3/Squeaksource is up and running. I cannot see anythin suspicous on the
> >> box (wich is clearly reachable via ssh?)
> >> 
> >> Best
> >> 	-Tobias
> > 
> > Thanks for the quick reply!
> > 
> > It is on line again for me now as well, and I can see that the squeaksource
> > image has been running without interruption for a couple of days, so the
> > box3 server did not go down.
> > 
> > The server was definitely not responding to my pings for a period of at least
> > several minutes (possibly much longer), and the box4 server was responding at
> > that time, so a network problem seems unlikely.
> > 
> > Still concerned but feeling much better now,
> 
> So, uptime said:
> root at box3-squeak:/home/ssdotcom# uptime
>  16:32:49 up 166 days, 22 min,  1 user,  load average: 0.70, 4.74, 8.52
> 
> And the _last two_ numbers are concerning. Basically, the server was overloaded.
> the Squeak vm uses about a gig of virtual memory (really?) and seems to compete with
> the jenkins running on the server. htop says, jenkins uses 25% of the systems memory
> while Squeak uses 19% (both of which I deem high).
>   So in the event of some jenkins jobs firing off and Squeaksource answering some requests,
> the server might become un-responsive?
> 

Something like that I think. I'm not sure what was generating the load, although
there is no question that adding squeaksource to box3 adds a significant resource
demand above that of the Jenkins jobs.

Allocating a big address space (1G) is normal for the VM, and in this case the
image is actually using a bit under 200MB, which is 20% of the system memory.
If there is some combination of squeaksource and jenkins activity that pushes
the total demand to the point of requiring swapping, then it's possible that
this would make the system unresponsive as I was seeing.

A number of the Jenkins jobs run squeak VMs in addition to the Java stuff,
so some combination of these might add up to a problem.

Dave



More information about the Box-Admins mailing list