Memory analysis

Tue Jun 21 16:39:04 UTC 2005

Look for the thread
Garbage Collection in Squeak starting Oct 29th, 2004
(I"ll attach the original note).

The changeset allow you with a current VM to monitor memory usage by  
Squeak and what the garbage collector is doing.
Although it is not designed to work at the class level, it would show  
over time how the memory footprint of the VM changes.

The changeset and macintosh VM can be found at
http://homepage.mac.com/johnmci
look in the  GC work  directory

In order to collect information about how the GC is working.

a) Take a test image and file in
JMMGCMonitor.4.cs

b) Run this altered  image with the mac carbon VM
Squeak 3.8.4Beta1.app  or higher.

c) in a work space do

GCMonitor run.

"do your test here"

GCMonitor stop. "When you stop your testing"

This will produce a file "gcStats.txt" which contains statistical  
data collected from the GC logic every 100 milliseconds.  Please  
email file to johnmci at mac.com with a bit of explanation of what you  
tested, please archive the file so it's zipped to reduce the size of it.

Now to test with some active tuning, please re-run your test but do

GCMonitor runActive.

"do your test here"

GCMonitor stop. "When you stop your testing"

I will note you must save a copy of the gcStats.txt before you do the  
runActive because it will overwrite any previous results, say the  
earlier results of doing "run". Please email the file and note this  
was the result of runActive, also note any observations about
pauses or odd behavior.

On 20-Jun-05, at 9:48 PM, David Mitchell wrote:

>
>
>>>
>>> Of course, you need enough memory to handle everything. 2GB  
>>> might  be a lot or not enough, depending on what each one of  
>>> those 10,000  or 500 records are doing. Looking at just the  
>>> 10000. There is  probably an uneven distribution (different  
>>> customers will have  different usage patterns). Maybe 60% of that  
>>> traffic wants to enter  during your peak hour, so 6,000 per hour  
>>> at say 1:00 P.M. local  time. If they timeout at 20 minutes...
>>>
>>>
>>
>> That's an interesting issue. I would need to carefully monitor  
>> memory  usage. Are there any docs that can help me incorporate  
>> some memory  usage analysis of squeak on a per application basis?
>>
>
> I haven't done anything serious in Squeak, but I googled: squeak  
> memory
> usage
>
> and came across this interesting site:
>
> http://www.visoracle.com
>
> Which seems to catalog some threads from this list. Found this:
>
> http://www.visoracle.com/squeakfaq/image-size.html
>
> Under the category of:
>
> Squeak Smalltalk : Tools Tricks Usage : prevnext Image Size Space  
> Tally
> Print Analysis
>
> --David
>
>
>
>
>

As some of you know I from time to time do serious GC tuning in  
VisualWorks.
http://www.smalltalkconsulting.com/papers/GCPaper/GCTalk%202001.htm

For years now (yes years) I've been thinking about doing serious  
tuning in Squeak, but yet things just weren't interesting enough.  
However with the release of Croquet, thoughts on TK4, and mumblings  
from folks I turned my eye upon what the Squeak GC was doing.  First  
thanks to Ted Kaehler (among others) for writing this GC 10 odd years  
back  which we've run with little change over the years until we  
added the ability to shrink/grow the memory space a few years back.

First I'll need some help from macintosh users that have interesting  
images and can run a pending 3.8.4 mac VM with instrumentation
and then send me some diagnostic log.  People who are willing to help  
should email me directly, and I'll get you a VM and changeset, and we'll
see if we can fill up my gmail account with log data. I'm very  
interested in getting Croquet data, having a test case you can run  
before/after would
be best.

So lets review how does the Squeak GC work:

A simplistic view is that  your image (25MB) is loaded into Old  
Space, and 4MB (set in Smalltalk) is allocated
for Young Space. After allocating 4000 (if they fit) new object  
(limit set in smalltalk) we do an mark/sweep/compacting GC on young  
space
using the VM roots plus roots identified from Old Space to Young  
Space references.  Normally this means looking only at a few thousand
objects, then occasionally we might do a full GC across all 300K  
object based on various conditions, but that's rare.

I'll note the Old Space to Young Space remember table, remembers the  
Object containing the reference, not
the reference so that if you have 100,000 element collection we  
interate over all 100,000 entries looking for the 1 or
more old to young references, a cause for performance concern.

If after completing the mark/sweep/compaction and over 2000 objects  
(set in Smalltalk) are survivors we "Tenure" them to
old space by moving the old/young pointer boundary, then start  
allocating objects again, mind if the young space
exceeds now 8MB free (set in smalltalk) we given memory back and  
reset things back to the 4MB boundary.

Now object allocation can end early because we are allocating a big  
object, in that case we do an
young GC, followed by a full GC (very expensive) followed by growing  
young space so the big object can
fit.   Also of course if we run out of root table remember space, we  
end the allocation process early and think about Tenuring things.

Or it can end if we drop below a minimum amount of memory (200K set  
by smalltalk) which makes the image grow, which if it
fails signals the low space semaphore (and brings up a low space  
dialog which no-one really see and have bitterly complained about)
Yes I now know why that occurs too...

The Problem:

Last weekend I built a new VM which has instrumentation to describe  
exactly what the GC is doing, also to
trigger a semaphore when an GC finishes, and to allow you to poke at  
more interesting things that control GC activity.

What I found was an issue which we hadn't realized is there, well I'm  
sure people have seen it, but don't know why...
What happens is that as we are tenuring objects we are decreasing the  
young space from 4MB to Zero.

Now as indicated in the table below if conditions are right (a couple  
of cases in the macrobenchmarks) why as you see the
number of objects we can allocate decreases to zero, and we actually  
don't tenure anymore once the survivors fall below 2000.
The rate at which young space GC activity occurs goes from say 8 per  
second towards 1000 per second, mind on fast machines
the young space ms accumulation count doesn't move much because the  
time taken to do this is under 1 millisecond, or 0, skewing
those statistics and hiding the GC time.

AllocationCount     Survivors
4000    5400
3209    3459
2269    2790
1760    1574
1592    2299
1105    1662
427    2355
392    2374
123    1472
89    1478
79    2
78    2
76    2
76    2

Note how we allocate 76 objects, do a young space GC, then have two  
survivors, finally we reach the 200K minimum GC
threshold and do a full GC followed by growing young space. However  
this process is very painful. Also it's why the low space dialog
doesn't appear in a timely manner because we are attempting to  
approach the 200K limit and trying really hard by doing thousands of
young space GCed to avoid going over that limit. If conditions are  
right, then we get close but not close enough...

What will change in the future.

a) A GC monitoring class (new) will look at mark/sweep/Root table  
counts and decide when to do a tenure operation if iterating
over the root table objects takes too many iterations. A better  
solution would be to remember old objects and which slot has the  
young reference but that is harder to do.

b) A VM change will consider that after a tenure if the young space  
is less than 4MB then growth will happen to make young space greater  
than 4MB plus a calculated slack. Then after we've tenured N MB we  
will do a full GC, versus doing a full GC on every grow operation,  
this will trigger a shrink if required.  For example we'll tenure at  
75% and be bias to grow to 16MB before doing full GC.

c) To solve hitting the hard boundary when we can not allocate more  
space we need to rethink when the low semaphore is signaled and the  
rate of young space GC activity, signaling the semaphore earlier will  
allow a user to take action before things grind to a halt. I'm not  
quite sure how to do that yet.

Some of this might be back-ported to earlier VM, I think so, yet I  
won't know until we gather more data and try a few things.

--
======================================================================== 
===
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===