The answer is yes but of course source.squeak.org is so so much smaller in terms of the data it has to handle and the traffic level that its swings in memory usage are rarely a significant problem.
I should have said something more when you started discussing this, and I apologize for my level of silence. I'm concerned that given the amount of trouble the original owners had with SqueakSource.com, I don't see how we can expect to do better, particularly with a virtual server with only 1GB of RAM allocated to it. Setting it to read-only, and I'm not sure but perhaps that is how it is set now, is of course going to reduce the load. But how much? It's not an easy question to answer.
Ken
On 10/08/2013 01:08 PM, David T. Lewis wrote:
Aha. That would do it for sure. So something is going on in the squeaksource image that is using a *lot* of object memory for some period of time. The use of 957m resident memory would very likely be enough to cause the symptoms that we saw.
Do you know if we see any similar pattern of memory usage on the source.squeak.org server? I'm already convinced that the squeaksource.com image badly needs to be updated to the same level of Squeak/Seaside/SqueakSource as our source.squeak.org server (due to socket leak problems if nothing else).
I also recall that squeaksource.com on the SCG server had horrible performance problems whenever we tried to commit a large MCZ to the VMMaker repository, and I always assumed that it was memory related in some way on another.
Thanks a lot Ken,
Dave
On Tue, Oct 08, 2013 at 12:37:17PM -0500, Ken Causey wrote:
oom_killer
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12929 ssdotcom 20 0 1028m 957m 784 R 99.9 95.0 12:05.45 /usr/local/lib/squeak/4.10.5-2619/squeakvm -vm-display-null squeaksource.2.image
Note the 6th field.
Ken
On 10/08/2013 12:22 PM, Frank Shearar wrote:
On 8 October 2013 17:48, David T. Lewislewis@mail.msen.com wrote:
On Tue, Oct 08, 2013 at 04:33:28PM +0100, Frank Shearar wrote:
On 6 October 2013 19:00, David T. Lewislewis@mail.msen.com wrote:
On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote: > On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote: >> >> So, uptime said: >> root@box3-squeak:/home/ssdotcom# uptime >> 16:32:49 up 166 days, 22 min, 1 user, load average: 0.70, 4.74, >> 8.52 >> >> And the _last two_ numbers are concerning. Basically, the server was >> overloaded. >> the Squeak vm uses about a gig of virtual memory (really?) and seems >> to compete with >> the jenkins running on the server. htop says, jenkins uses 25% of the >> systems memory >> while Squeak uses 19% (both of which I deem high). >> So in the event of some jenkins jobs firing off and Squeaksource >> answering some requests, >> the server might become un-responsive? >> > > Something like that I think. I'm not sure what was generating the > load, although > there is no question that adding squeaksource to box3 adds a > significant resource > demand above that of the Jenkins jobs. > > Allocating a big address space (1G) is normal for the VM, and in this > case the > image is actually using a bit under 200MB, which is 20% of the system > memory. > If there is some combination of squeaksource and jenkins activity that > pushes > the total demand to the point of requiring swapping, then it's > possible that > this would make the system unresponsive as I was seeing. > > A number of the Jenkins jobs run squeak VMs in addition to the Java > stuff, > so some combination of these might add up to a problem. >
I am now running top every 30 seconds for the next 24 hours, with output directed to ~ssdotcom/tmp/top.out. Possibly this will show us something interesting.
If that's still running, it's probably saying "ow! ow! stop it!" right now. If Tony Garnock-Jones& I could figure out why jobs are failing on his slaves, I'd suggest moving builds off the box entirely. I'll probably turn my old laptop into a build slave... once I can get it up & running again. That too will help with keeping work off the box.
No worries, I only ran it for a 24 hour period. I saw occasional load increases, but nothing like the "load average: 0.70, 4.74, 8.52" that Tobias spotted right after the outage. I think we just need to keep our eyes open for problems in case it comes back ... usually if a thing can fail once, it will fail again eventually ;-)
Hm, OK, so build.squeak.org could be unavailable for a different reason!
frank
Dave