Re: [squeak-dev] What are we leaking in the www.squeak.org process? - Vm-dev

21 Oct 2010


      Janko and I have been doing some fiddling around and monitoring today
and I begin to develop a theory.
First, meet some images:
Image 1: SqueakMap (map.squeak.org).  This is our oldest Squeak service
image with the possible exception of wiki.squeak.org.  This is a 3.8
image running on a 3.8 VM.  SqueakMap saves it's data to a separate file
and snapshots of the image are only made when source code changes occur,
aka manually.
Image 2: SqueakSource (source.squeak.org).  This is a 3.11 image running
on a 3.11 VM.  This image saves it's data by snapshoting hourly.
Image 3: www.squeak.org.  This is a 3.9 VM that has been running for
some time on the same 3.8 VM as SqueakMap but we just a short while ago
decided to try it on the 3.11 VM SqueakSource is using and it is running
fine so far.
So for a few hours now I have been monitoring the memory usage and GC
behavior of these three images.  The short story is that SqueakMap is
almost completely rock solid but that the SqueakSource image, despite my
earlier claims to the contrary, appears to suffer a much reduced form of
the problem we are observing on www.squeak.org.
To reiterate:  The problem is that the resident set size of the process
appears to continually grow.  The operating system will occasionally
swap out some of this giving the impression that the RSS has dropped but
I'm almost certain that the total cost (RAM+swap) of the process is the
same and continuing to increase.  Our available RAM and swap on the
server is quite limited with all the various services we are running.
What I'm observing is a discrepancy between what the VM claims is the
end of memory and the RSS.  When any of these images start this
discrepancy is quite small, maybe a megabyte.  But whenever a GC occurs
the end of memory figure drops, but the RSS goes up.  So from the images
side it all appears copacetic: we cleaned up the garbage and we are more
or less back at our base memory usage level.  But as far as the
operating system is concerned nothing has happened, in fact more memory
has been claimed.
So my theory is that somehow when GC occurs on the linux VM (3.8 VM and
3.11 VM) the freed up memory is not released to the operating system but
it seems that new objects go in a different memory location from the
operating system's perspective therefore using additional memory on top
of what was used before the GC.  I only really observe this on full GCs.
When few if any full GCs occur, there is no real problem.  But when you
do things like snapshot the image hourly, this all adds up to several
megabytes a day of lost RAM.
This may not be quite right, I'm still observing and of course this
theory is built out of black box observation and little else.  Feel free
to set me straight.
Ken