[Box-Admins] box3.squeak.org off line - HELP neeeded

Wed Oct 9 01:52:37 UTC 2013

On Tue, Oct 08, 2013 at 07:54:05PM -0500, Chris Muller wrote:
> I just deployed an updated image for "new trunk" at
> box4.squeak.org:8888/trunk.  This is the one that uses the Magma
> backend.  I haven't done any benchmarks but the response time seems
> pretty snappy, even compared to regular source.squeak.org.  Could be
> due to Cog, the Magma backend, or box4 being under less load than
> box3.

It might be all of the above. I'm thinking that the priority for the
squeaksource.com image is to first get it up to date to the level of
source.squeak.org, then at our leisure we can bring both of them together
up to a the level that you are demonstarting with Cog and Magma. But
the main thing is to get squeaksource.com stable, which is not yet the case.

> 
> I need to tell you -- I did experience a 1GB memory situation myself
> the other day!  No idea how it could happen especially since I'm
> doubtful there was any significant level of activity.
> 
> It was strange, but I simply restarted it.  Now I'll really keep my eyes open.

I'm not sure what causes those large memory usage blips, but at some point
we will need to figure it out.

Meanwhile, I took a look at the running squeaksource.com image, which as
Ken pointed out was using nearly 1GB of memory, and which oh by the way
had crashed about 20 times in the last day or so with out-of-memory errors.

It turns out that the following snippet from the workspace that SCG provided
took care of the problem for now:

	" kill runaway processes "
	ProcessBrowser open.
	Process allInstances do: [ :each |
		each priority = 30 ifTrue: [ 
			each terminate ] ].

So I'm guessing that some Seaside handler got wedged, presumably concurrent
with an excessive memory usage condition, and it probably failed in some way
that did not let the garbage collector clean up the mess.

I cleaned it up for the time being on our box3 server, and things seem to be
back to "normal" (aka waiting for the next failure).

So my non-scientific, hand-waving summary is:

- Stuff happens that causes SqueakSource to temporarily allocate a lot of
object memory, possibly in response to some request coming in through Seaside.

- Whenever the aforementioned stuff fails for some reason, the old squeaksource.com
image craps out horribly and keeps references to whatever was going on at
the time of the crap out.

- The newer source.squeak.org image probably is doing the same bad things
with respect to using large gobs of object memory, but it probably fails
in reasonable ways that free up object allocations and clean up socket
handle references, etc.

Therefore: We should continue to manually monitor the squeaksource.com image
on box3, and clean up whatever messes it causes as best we can. But with
high priority, the next step is to update squeaksource.com to use the
source.squeak.com image, and verify that this puts us into a stable
state of affairs. If and when that is successfully accomplished, we should
be able to update both images to take advantage of Magma and Cog, etc.

Dave