Large array slows down Squeak
mparker at CaseServices.com
Thu Jun 1 20:52:44 UTC 2000
> At 08:51 31.05.00 -0500, you wrote:
> > > And there's another interesting observation. In Morphic,
> the delay
> > > because of tenuring is much larger than in MVC. This is probably
> > > because Morphic creates much more short-lived object garbage.
> >Umm, 'splain again why tenuring would make Morphic feel slow?
> My guess was that these objects aren't short-lived enough.
> If they survive
> one GC, they need to be copied - and this might take quite a few
> milliseconds. I'd guess that Morphic creates too many Point
> and Rectangle
> objects just to draw one screen. And as there are most often
> objects which require continuous screen updates, it will GC
> quite often.
> So perhaps a "Smalltalk vmParameterAt: 5 put: 10000" will
> reduce the
> number of GCs. Could you please explain why increasing this
> value has problems?
I'm certainly not an expert on the Squeak collector, so one of
the authors of the collector may want to step in and correct me,
but here goes: The memory system reserves 4000 words at the end
of eden to be used for storing relocation records. This is enough
to relocate a minimum of 2000 objects. If there is additional
spare free memory at the end of eden, it will also be available
for relocation records. If I'm reading the code correctly, if it
runs out of relocation records, it only compacts the last
contiguous run of surviving objects up against the next-to-last
contiguous run of survivors, then tries the compaction process
again. If compacting that one survivor run freed up enough memory
at the end of eden space that there is now enough room for all
the relocation records, then this second compaction pass will
complete normally. If not, then it again falls back to compacting
the last two runs together. Needless to say, this can get really
slow, and (I think, I need to refresh my memory when I get back
home) contains the possibility that a minor compaction may fail
to terminate (if the last contiguous run contains more objects
than there are relocation records). At any rate, the combined
affect of the 4000 object gc threshold and the 2000 object tenuring
threshold is that these pathological cases don't happen very often,
at the cost of (imho) excessively frequent minor collections,
as well as reduced flexibility in the scavenging and tenuring
> >Creating lots of temporary objects will trigger lots more minor
> >GC's which could make it run slower.
> Yes, that was my argument. To prove it, I created a simple test for
> morphic. Without any interaction and just a workspace (and of
> course my
> InfoStringMorph) open, I get about 2 incremental GCs per second.
> Selecting text in the workspace raises the GCs/sec to 25!
> If I increment the object count to 10000, I get only 9 GCs/sec when
> selecting text in that same workspace.
About what I would have expected. OTOH, each minor GC will take a
bit longer, but as long as your code matches the generational hypothesis
the GC time doesn't increase linearly.
> I think, this proves that objects live too long.
Not really. It just proves that forcing a minor GC every 10000
allocations will trigger fewer GC's than triggering one every 4000
allocations. 9/25 is roughly equal to 4/10 (2.7 vs 2.5)
> My system spends about 10 milliseconds or 1% of all time in GCs
> per second. (0.3% if count = 10000)
One of the goals of Squeak's allocator (and of scavenging
collectors in general) is to reduce GC-induced latency to
<10ms, short enough that you shouldn't notice it if it happens
while you're tracking the mouse.
If your numbers are right, and you're only spending about 1%
of your time in the collector when you're in Morphic, then GC
overhead wouldn't appear to be the cause of Morphic's slowness
relative to MVC.
> If I allocate a one-million-elements array, spends about 230
> milliseconds or 23% of all time in GC per second (12% if count
> = 10000)! This means, the incremental GC doesn't scale well
> at all.
Yep, this isn't a good thing, but Squeak isn't the first system
to dislike large pointer-containing objects. They violate some
of the fundamental assumptions underlying generational scavenging
Still, 23% GC overhead in this case isn't too bad if you think
about it. Under normal circumstances Squeak tries to keep < 2000
objects in edenspace. Assuming 25-30 bytes/object that's only
about 60k on average. Suddenly you inject about six thousand
times that much data into eden but the collection time only
increases by a factor of 23!
Actually, 23% overhead isn't bad anyway in the grand scheme of
things. I'm pretty sure it was Benjamin Zorn who profiled a bunch
of allocation schemes (mark & sweep, mark & sweep & compact, mark &
sweep & compact with a separate mark bit array, stop & copy,
generational scavenging, as well as a variety of malloc/free
implementations), and discovered several real-world C programs
that had memory management overheads greater than 40%, while most
commonly used garbage collectors managed much lower overhead
under much higher allocation loads.
Prematurely tenuring large objects isn't really a good answer
either, because object size is a poor predictor of object
lifetime, so when the references to this object disappears
it'll become a huge chunk of nothing clogging up your oldspace,
and partly because writes into tenured objects are much more
expensive than writes into untenured objects. This may not be
so noticeable in a system like Squeak, where the interpreter
overhead can mask the 20-40 instruction overhead, but it's much
more noticeable in a compiled, optimized system like SML or
Self. Another problem is that unless all the objects in this
array are also tenured, the collector will still have to scan
the array every minor GC looking for pointers to objects in eden.
Unfortunately, there is no universal solution. Industrial-
strength collectors tend to have whatzit-knobs and tommybars all
over them to allow them to be tuned to the particular application,
but to my knowledge no collection scheme deals gracefully with
anything but an idealized "general case".
I don't know how interested you are in persuing this, but I'd
be interested to know what the GC overhead was in the case where
you did the major GC just before allocating the large array (so
that it was created at the front of edenspace and therefore
wouldn't need to be moved during compaction).
More information about the Squeak-dev