frustrations in de-slowifying code.

Sat Mar 20 16:36:24 UTC 2004

Mmm I'm wondering here if you need to adjust the garbage collector.   
Right now the VM (unix/mac/windows) will
grow memory by a set amount, or the amount needed plus slack. Once it's  
grown and we exceed a threshold it
and memory in new space comes free we will shrink things. This is  
controlled by vmParameter 24 and 25.

Bad choices lead to overhead as we grown/shrink memory allocated to the  
VM. Not that we point out what our 'good' choices are
in any case.

> Smalltalk vmParameterAt: 25 put: 24*1024*1024. "grow headroom"
> 	Smalltalk vmParameterAt: 24 put: 48*1024*1024.  "shrink threshold"

The other magic numbers are 5 & 6 which control when a incremental GC  
is created, and when we tenure objects.
So for example if I know I'm going to create a bunch of points  
(100,000+) I might consider changing this to at 5 put: 50,000
at 6 put 40,000.  The trade off here is how long it will take to trace  
50,000 live objects. Although machines are so fast
now perhaps we need to revisit this.  {I'm sure for years now I've  
considered rewriting the logic a bit so we can adapt the values
to machine characteristics, since these values were chosen to give  
25mhz mac 68040 machines  minimal igc times so that music
playback would work without stuttering}

Since I'm speak on this topic at Smalltalk Solutions 2004 perhaps it's  
time for some coding...

/* Defaults found in the image */
SmalltalkImage current  vmParameterAt: 5 put: 4000.  "do an incremental  
GC after this many allocations"
SmalltalkImage current  vmParameterAt: 6 put: 2000.  "tenure when more  
than this many objects survive the GC"

I will note I looked at this last summer and found for the base image,  
and what people normally do, altering the current values
doesn't perhaps buy you anything. The values chosen, are reflective of  
the work you are doing.

Plus of course is how we look at the remember table. Old objects that  
point to new Objects. If you get a large collection into that logic
we spin thru all the elements (millions?) when an igc occurs. This can  
be solved by either: rewritting the GC logic, or doing a full GC
in the right place so things get tenured. Mind having adaptive feedback  
from the GC logic watching for this would work too {See John's list of  
things to do someday}

I'll suggest if you can create all your objects, do a full GC, then do  
the computations, that might improve things.

	VM parameters are numbered as follows:
		1	end of old-space (0-based, read-only)
		2	end of young-space (read-only)
		3	end of memory (read-only)
		4	allocationCount (read-only)
		5	allocations between GCs (read-write)
		6	survivor count tenuring threshold (read-write)
		7	full GCs since startup (read-only)
		8	total milliseconds in full GCs since startup (read-only)
		9	incremental GCs since startup (read-only)
		10	total milliseconds in incremental GCs since startup (read-only)
		11	tenures of surving objects since startup (read-only)
		12-20 specific to the translating VM
		21	root table size (read-only)
		22	root table overflows since startup (read-only)
		23	bytes of extra memory to reserve for VM buffers, plugins, etc.

		24	memory headroom when growing object memory (rw)
		25	memory threshold above which shrinking object memory (rw)"

On Mar 20, 2004, at 6:58 AM, David T. Lewis wrote:

> On Sat, Mar 20, 2004 at 02:36:57AM -0500, Alan Grimes wrote:
>> om
>>
>> I have been working on my retina emulator and have gotten to the point
>> of optomizing it...
>
> He Alan,
>
> I assume you are aware of the TimeProfileBrowser... if not, that should
> absolutely be your next step. See  
> <http://minnow.cc.gatech.edu/squeak/2133>.
>
>> I've hit several frustrations:
>>
>> 1. making revisions that would seem to combine the jobs of a 4 second
>> method with a 5 second method removing one of the loops and the  
>> creation
>> of a large array in the process produced a 15 second method... =\
>>
>> 2. The run-times are _EXTREMELY_ variable. said 15 second method
>> sometimes takes as long as 5 _MINUTES_.
>
> If your image is getting big enough to generate paging activity in your
> operating system, you will see severe performance problems. I don't  
> know
> what OS you are using, but for example on Linux you could use xosview  
> to
> get a good idea of what is going on with swapping in the OS, and in
> your Squeak image use world menu->help->VM statisics to look at your
> object memory.
>
>> 4. I noticed that there is a primitive called "matrix 3x3 multiply"
>> which _DOES NOT_ do the textbook matrix multiplication but rather a
>> 3-part vector multiply with some other computations. This could
>> eliminate most of the code in my 19.5 second method. However, I don't
>> fully understand primatives. It seems that once a primative is called  
>> it
>> takes over a method and has access to all of its parameters (though  
>> this
>> is not completely obvious).
>>
>> It would be amazingly useful to have a primitive tutorial and  
>> refferance
>> manual...
>
> There is some reasonable information on the Swiki, but based on what
> you described, you do not need a primitive. At least not yet. There may
> be things you can change in your Squeak code that would give you an
> order of magnitude improvement before you have to think about writing
> a primitive.  After that, you will be able to see (with the  
> TimeProfileBrowser)
> how much improvement you might expect to get out of writing a  
> primitive,
> and whether it would be worth your time and effort.
>
> Dave
>
>
>
>
--
======================================================================== 
===
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===