[Box-Admins] box3.squeak.org off line - HELP neeeded

Frank Shearar frank.shearar at gmail.com
Tue Oct 8 17:22:06 UTC 2013


On 8 October 2013 17:48, David T. Lewis <lewis at mail.msen.com> wrote:
> On Tue, Oct 08, 2013 at 04:33:28PM +0100, Frank Shearar wrote:
>> On 6 October 2013 19:00, David T. Lewis <lewis at mail.msen.com> wrote:
>> > On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote:
>> >> On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote:
>> >> >
>> >> > So, uptime said:
>> >> > root at box3-squeak:/home/ssdotcom# uptime
>> >> >  16:32:49 up 166 days, 22 min,  1 user,  load average: 0.70, 4.74, 8.52
>> >> >
>> >> > And the _last two_ numbers are concerning. Basically, the server was overloaded.
>> >> > the Squeak vm uses about a gig of virtual memory (really?) and seems to compete with
>> >> > the jenkins running on the server. htop says, jenkins uses 25% of the systems memory
>> >> > while Squeak uses 19% (both of which I deem high).
>> >> >   So in the event of some jenkins jobs firing off and Squeaksource answering some requests,
>> >> > the server might become un-responsive?
>> >> >
>> >>
>> >> Something like that I think. I'm not sure what was generating the load, although
>> >> there is no question that adding squeaksource to box3 adds a significant resource
>> >> demand above that of the Jenkins jobs.
>> >>
>> >> Allocating a big address space (1G) is normal for the VM, and in this case the
>> >> image is actually using a bit under 200MB, which is 20% of the system memory.
>> >> If there is some combination of squeaksource and jenkins activity that pushes
>> >> the total demand to the point of requiring swapping, then it's possible that
>> >> this would make the system unresponsive as I was seeing.
>> >>
>> >> A number of the Jenkins jobs run squeak VMs in addition to the Java stuff,
>> >> so some combination of these might add up to a problem.
>> >>
>> >
>> > I am now running top every 30 seconds for the next 24 hours, with output directed
>> > to ~ssdotcom/tmp/top.out. Possibly this will show us something interesting.
>>
>> If that's still running, it's probably saying "ow! ow! stop it!" right
>> now. If Tony Garnock-Jones & I could figure out why jobs are failing
>> on his slaves, I'd suggest moving builds off the box entirely. I'll
>> probably turn my old laptop into a build slave... once I can get it up
>> & running again. That too will help with keeping work off the box.
>>
>
> No worries, I only ran it for a 24 hour period. I saw occasional load
> increases, but nothing like the "load average: 0.70, 4.74, 8.52" that
> Tobias spotted right after the outage. I think we just need to keep
> our eyes open for problems in case it comes back ... usually if a thing
> can fail once, it will fail again eventually ;-)

Hm, OK, so build.squeak.org could be unavailable for a different reason!

frank

> Dave
>


More information about the Box-Admins mailing list