Image freezing problem

Thu Jul 26 21:17:05 UTC 2007

On Jul 26, 2007, at 10:06 AM, Adrian Lienhard wrote:

>
> c) Tune the GC policies as they are far from optimal for today's  
> systems (as John has suggested a couple of times). It seems,  
> though, that this cannot guarantee to fix the problem but it should  
> make it less likely to happen(?).
> I also took GC statistics while running the process (only part of  
> it and not when it was stuck):
> - http://www.netstyle.ch/dropbox/gcStats-1.txt.gz: standard  
> configuration (was very slow)

Yes this is the classic, I think we are going to do a fullgc, but  
lets first see if a IGC will get back just one more byte. Oh it did,  
fine lets run until
the next memory allocation and repeat. I'm sure a fullgc will happen,  
someday or maybe 10,000 IGC iterations later.

> - http://www.netstyle.ch/dropbox/gcStats-2.txt.gz: runActive with  
> setGCBiasToGrowGCLimit: 16*1026*1024

Statistical Data is always good.

Lets see I have some observations and random thoughts.

I note you have a memory allocation of 100MB and grow to 150MB
Likely you should look at the grow/shrink threshold values they  
likely should be adjusted.

when it running I see
45,042 marked objects    that is the number of objects marked when  
doing the IGC, which is all live objects in new space and references  
from old to new space.
4,001 sweeped objects,  that is the number of objects sweep (visited  
linearly after GC mark)
4,000  is the allocation count.
639 is survivor count
4 milliseconds is IGC time.

Note actually later this jumps to 86K objects, likely you have the  
issue where a large array in old space has been marked as a root and  
you are iterating over it looking
for items in young space.  The tenuring on each IGC helps that.

I believe TIM is up for fixing this problem, just slip him *lots* of  
EUROS first.

In theory you could have a much larger allocation and tenure count,  
Andreas has some data on larger values,  Likely 40,000 and tenure at  
35,000 would come to mind?
{Well actually I might take this back, given your marking issue}

However the other thing is that at 45,042 objects to mark seems a bit  
high. I've used the theory for Sophie that if marking is 2x the  
number of allocations then
we should tenure. Note that your 45,042 marked objects are a result  
of only 639 survivors, but I think you are doing this already?  Still  
you could consider
how many allocations versus marked objects is acceptable since the 2x  
might not be correct, since it's following a pattern
of 312,364,552,45K,48K,44K,45K,44K,44K  Then repeats  so you might be  
tenuring too early?
Then again maybe not since a FULL gc doesn't return any memory

In JMM memory poliicy >> calculateGoals I have...

(statMarkCount ) > (statAllocationCount*2)
		ifTrue: [[Smalltalk forceTenure] on: Error do: [:ex | ]].  "Tenure  
if we think too much root table marking is going on"

Also I can't speak if on your machine marking 45K objects should take  
4 milliseconds, don't know if that is a rational value.  Of course if  
you increase
the allocation limit, then marking count will go up in relationship  
to that.  Other people might have to think if marking only 10,000  
objects per ms seems
rational.  In checking some other examples I have, it does however  
seem in line with another sample I have.

setGCBiasToGrowGCLimit:

should be set to some value that won't interfere much with the work  
load. If for example you always expect to
grab 40MB of memory to perform a task, setting it to 16MB is less  
valuable than setting it to 50MB.
Keep in mind of course once you hit the 50MB limit you will need to  
do a full GC, and what is the time impact on what you are doing?

--
======================================================================== 
===
John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===