[Vm-dev] Is it possible to suspend the garbage collector?

Tue Jan 12 16:39:07 UTC 2016

Hi John,

Thanks for the explanation.

As far as I understand, these settings won’t help in my particular case (I experimented with turning the bias on and off before allocating the ByteArray but didn’t see any influence). Still good to know about them.

I still have a few questions regarding garbage collection.

1. When I allocate a ByteArray, as Eliot suggested, like so:

Smalltalk vmParameterAt: 24 put: 150*1024*1024.
ByteArray new: 70*1024*1024.

then I see that the young space has grown a lot, as expected, and free space is low:

before allocation:
	memory	57,032,988 bytes
	old	44,259,392 bytes (77.60000000000001%)
	young	5,824 bytes (0.0%)
	used	44,265,216 bytes (77.60000000000001%)
	free	12,767,772 bytes (22.400000000000002%)

after allocation:
	memory	117,463,180 bytes
	old	39,666,076 bytes (33.800000000000004%)
	young	76,523,392 bytes (65.10000000000001%)
	used	116,189,468 bytes (98.9%)
	free	1,273,712 bytes (1.1%)

If I now force an incremental GC with #garbageCollectMost then then *nothing changes*, i.e. young space is still full and free space is low. What I would expect, since incremental GC, as I understand it, marks and sweeps young space, is that young space becomes small and free space becomes large (allocated memory of the ByteArray moves from young to free space):

after #garbageCollectMost:
	memory	117,463,180 bytes
	old	39,666,076 bytes (33.800000000000004%
	 young	75,339,048 bytes (64.10000000000001%)
	used	115,005,124 bytes (97.9%)
	free	2,458,056 bytes (2.1%)

Only once I do a full garbageCollect does the ByteArray get collected:
	memory	121,025,356 bytes
	old	39,703,404 bytes (32.800000000000004%)
	young	3,265,428 bytes (2.7%)
	used	42,968,832 bytes (35.5%)
	free	78,056,524 bytes (64.5%)

Why do I see this behaviour? In Cog I see the expected outcome (free space big, young space small) even without triggering an incremental GC manually.

2. There’s also still the question about where those 500ms disappear to (see last e-mail), which are unaccounted for in the MessageTally output.

Cheers,
Max

> On 11 Jan 2016, at 20:03, John McIntosh <johnmci at smalltalkconsulting.com> wrote:
> 
> Ok as the author of setGCBiasToGrowGCLimit & setGCBiasToGrow I was looking for a paper I gave at OOPSLA 2005 squeakfest on this. But can't find it.  However the original note from January 18th 2005 explains
> 
> Mac 3.8.6b4
> 
> Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024. "Set growth limit before full GC to 16MB"
>                                 Smalltalk setGCBiasToGrow: 1.
>                                 Set bias to grow upto GCLimit, this can avoid a problem where we attempt to avoid growing but results in thousands of incremental GC events as we approach a knee in a curve of space used versus the growth/compaction decision.
> 
> Plus added this
> " A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required.  For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC."
> 
> 
> 
> 
> The Problem:
> 
> Last weekend I built a new VM which has instrumentation to describe exactly what the GC is doing, also to
> trigger a semaphore when an GC finishes, and to allow you to poke at more interesting things that control GC activity.
> 
> What I found was an issue which we hadn't realized is there, well I'm sure people have seen it, but don't know why...
> What happens is that as we are tenuring objects we are decreasing the young space from 4MB to Zero.
> 
> Now as indicated in the table below if conditions are right (a couple of cases in the macrobenchmarks) why as you see the
> number of objects we can allocate decreases to zero, and we actually don't tenure anymore once the survivors fall below 2000.
> The rate at which young space GC activity occurs goes from say 8 per second towards 1000 per second, mind on fast machines
> the young space ms accumulation count doesn't move much because the time taken to do this is under 1 millisecond, or 0, skewing
> those statistics and hiding the GC time.
> 
> AllocationCount         Survivors
> 4000    5400
> 3209    3459
> 2269    2790
> 1760    1574
> 1592    2299
> 1105    1662
> 427     2355
> 392     2374
> 123     1472
> 89      1478
> 79      2
> 78      2
> 76      2
> 76      2
> 
> Note how we allocate 76 objects, do a young space GC, then have two survivors, finally we reach the 200K minimum GC
> threshold and do a full GC followed by growing young space. However this process is very painful. Also it's why the low space dialog
> doesn't appear in a timely manner because we are attempting to approach the 200K limit and trying really hard by doing thousands of
> young space GCed to avoid going over that limit. If conditions are right, then we get close but not close enough...
> 
> What will change in the future.
> 
> a) A GC monitoring class (new) will look at mark/sweep/Root table counts and decide when to do a tenure operation if iterating
> over the root table objects takes too many iterations. A better solution would be to remember old objects and which slot has the young reference but that is harder to do.
> 
> b) A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required.  For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC.
> 
> c) To solve hitting the hard boundary when we can not allocate more space we need to rethink when the low semaphore is signaled and the rate of young space GC activity, signaling the semaphore earlier will allow a user to take action before things grind to a halt. I'm not quite sure how to do that yet.
> 
> 
> On Mon, Jan 11, 2016 at 3:21 AM, Max Leske <maxleske at gmail.com <mailto:maxleske at gmail.com>> wrote:
>  
> 
>> On 10 Jan 2016, at 20:22, vm-dev-request at lists.squeakfoundation.org <mailto:vm-dev-request at lists.squeakfoundation.org> wrote:
>> 
>> Hi Max,
>> 
>>    pre-Spur to avoid GC one has to a) grow memory by enough to do all the
>> processing you're going to do and b) change the shrinkage parameter so the
>> Vm won't shrink the heap back down before the processing is complete.  To
>> do b) I suggest you modify setGCParameters.  vm parameters 24 sets the
>> shrinkage threshold; see vmParameterAt:put:: "24 memory threshold above
>> whichto shrink object memory (read-write)".  growMemory.  Hmmm, I had
>> thoguht that there's a growMemoryBy: primitive in v3, but it appears there
>> isn't.  So simply allocate a ByteArray of the desired size and then GC to
>> get rid of it.  That should leave that much free space and then your load
>> should proceed without needing to GC.
>> 
>> Anyway, it's worth a try.
> 
> Thanks Eliot.
> 
> Setting the memory threshold helped. I’m still seeing one full GC which I’m trying to avoid. I’ve experimented with #setGCBiasToGrow: and #setGCBiasToGrowGCLimit: but I don’t fully understand what they do. 
> #setGCBiasToGrow: seems to turn memory growth on and off. But if this is turned off, how can the VM then allocate more memory?
> #setGCBiasToGrowGCLimit: seems to control if the growth should trigger a full GC, which seems pretty much like what I need.
> 
> Unfortunately, while setting these options seems to have an influence, I can’t quite see the pattern, and that one full GC is still there. Maybe you could explain how these options work exactly?
> 
> One other question: the MessageTally output seems to be missing about 50% of the running time. Summing up over full GC + incremental GC + time spent in the tree, leaves about 500ms unaccounted for. Do you have any idea where that half second goes missing?
> 
> 
> Here’s the code I use experimentally:
> 
> MessageTally spyOn: [
> | size shrinkThreshold |
> size := (self settings segmentsDirectory fileNamed: 'snapshot.bin') size.
> shrinkThreshold := (Smalltalk vmParameterAt: 24).
> Smalltalk vmParameterAt: 24 put: shrinkThreshold +  (size*2). "(8MB + twice file size)"
> Smalltalk setGCBiasToGrowGCLimit: shrinkThreshold +  (size*2).
> Smalltalk setGCBiasToGrow: 1. “enable growth??"
> ByteArray new: size*2.
> 
> "incremental GC should take care of collecting the ByteArray, so I’m not doing anything
> manually here"
> 
> <load snapshot> ].
> 
> 
> 
> Cheers,
> Max
> 
> 
> 
> Output from current MessageTally:
> 
>  - 1123 tallies, 1125 msec.
> 
> **Tree**
> --------------------------------
> Process: (40s)  123994112: nil
> --------------------------------
> 12.7% {143ms} CBImageSegment class(NSImageSegment class)>>basicSnapshot:from:do:
>   12.6% {141ms} CBImageSegment class(NSImageSegment class)>>installSegmentFrom:andDo:
>     12.6% {141ms} CBImageSegment class(NSImageSegment class)>>readSegmentFrom:
>       12.6% {141ms} NSSegmentStream>>readObject
>         12.6% {141ms} SmartRefStream>>nextAndClose
>           12.6% {141ms} SmartRefStream>>next
>             12.3% {138ms} SmartRefStream(ReferenceStream)>>next
>               12.3% {138ms} SmartRefStream(DataStream)>>next
>                 10.6% {119ms} CBImageSegment(ImageSegment)>>comeFullyUpOnReload:
>                   |10.6% {119ms} CBImageSegment(NSImageSegment)>>restoreEndiannessAndRehash
>                   |  5.5% {62ms} Dictionary>>rehash
>                   |    |2.8% {31ms} Dictionary>>associationsDo:
>                   |    |  |2.2% {25ms} Array(SequenceableCollection)>>do:
>                   |    |1.7% {19ms} Dictionary>>noCheckAdd:
>                   |    |  1.7% {19ms} Dictionary(HashedCollection)>>findElementOrNil:
>                   |    |    1.2% {13ms} Dictionary>>scanFor:
>                   |  4.5% {51ms} primitives
>                 1.2% {13ms} SmartRefStream(DataStream)>>readArray
>                   1.2% {13ms} SmartRefStream>>next
>                     1.2% {13ms} SmartRefStream(ReferenceStream)>>next
>                       1.2% {13ms} SmartRefStream(DataStream)>>next
> **Leaves**
> 
> **Memory**
> 	old			+94,031,228 bytes
> 	young		-9,207,660 bytes
> 	used		+84,823,568 bytes
> 	free		+90,024,824 bytes
> 
> **GCs**
> 	full			1 totalling 85ms (8.0% uptime), avg 85.0ms
> 	incr		15 totalling 271ms (24.0% uptime), avg 18.0ms
> 	tenures		10 (avg 1 GCs/tenure)
> 	root table	0 overflows
> 
> 
>> 
>> On Sat, Jan 9, 2016 at 3:03 AM, Max Leske wrote:
>> 
>>> 
>>> Hi,
>>> 
>>> I have a rather annoying problem. I’m running a time critical piece of
>>> code that reads a big (~90MB) image segment from a file. I’ve optimized
>>> loading as far as possible and now GC takes far longer than the loading
>>> itself (see the MessageTally output below).
>>> I’m wondering if there’s any possibility to defer garbage collection
>>> during the load.
>>> 
>>> For completeness, here’s the use case: the process is socket activated,
>>> which means that the first request coming in will start the process. When
>>> the image starts it will load the segment to restore the last state of the
>>> application and, once that’s done, serve the request. The critical time
>>> includes vm startup, image startup, starting the server in the image and
>>> loading the snapshot. With a big snapshot the loading time of the snapshot
>>> is the most significant contributor.
>>> 
>>> Maybe I could preallocate the needed memory to prevent the garbage
>>> collector from running?
>>> 
>>> I’d appreciate any ideas you have.
>>> 
>>> 
>>> Cheers,
>>> Max
>>> 
>>> 
>>> PS: This needs to run on a Squeak 4.0.3 VM (no JIT)
>>> 
>>> 
>>> 
>>> 
>>> Output from MessageTally:
>>> 
>>> - 1624 tallies, 1624 msec.
>>> 
>>> **Tree**
>>> --------------------------------
>>> Process: (40s)  592969728: nil
>>> --------------------------------
>>> 4.4% {72ms} CBImageSegment class(NSImageSegment
>>> class)>>basicSnapshot:from:do:
>>>  4.4% {72ms} CBImageSegment class(NSImageSegment
>>> class)>>installSegmentFrom:andDo:
>>>    4.4% {72ms} CBImageSegment class(NSImageSegment
>>> class)>>readSegmentFrom:
>>>      4.4% {72ms} NSSegmentStream>>readObject
>>>        4.4% {72ms} SmartRefStream>>nextAndClose
>>>          4.4% {72ms} SmartRefStream>>next
>>>            4.3% {70ms} SmartRefStream(ReferenceStream)>>next
>>>              4.3% {70ms} SmartRefStream(DataStream)>>next
>>>                3.2% {52ms}
>>> NSImageSegment(ImageSegment)>>comeFullyUpOnReload:
>>>                  3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash
>>> **Leaves**
>>> 3.2% {52ms} NSImageSegment>>restoreEndiannessAndRehash
>>> 
>>> **Memory**
>>>        old                     +92,704,656 bytes
>>>        young           -8,008,252 bytes
>>>        used            +84,696,404 bytes
>>>        free            +1,287,768 bytes
>>> 
>>> **GCs**
>>>        full                    2 totalling 954ms (59.0% uptime), avg
>>> 477.0ms
>>>        incr            5 totalling 165ms (10.0% uptime), avg 33.0ms
>>>        tenures         1 (avg 5 GCs/tenure)
>>>        root table      0 overflows
>> 
> 
> 
> 
> 
> 
> -- 
> ===========================================================================
> John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk <https://www.linkedin.com/in/smalltalk>
> ===========================================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160112/05346997/attachment-0001.htm