Startup time, a short review & discussion

John M McIntosh johnmci at smalltalkconsulting.com
Thu Jul 31 22:54:55 UTC 2003


I've been glancing at squeak startup time on the mac this morning and  
have a few observations. Actually I have noticed on
windows the images "SNAP' open, but take seconds on the mac and I'd  
like to fix that... What I've discovered is that

13.3% of the time goes to initializeObjectMemory actually  
(adjustAllOopsBy:by:)
4.3% of the time goes to flushExternalPrimitives.
(ignore the rest because I'm not sure now to fix yet).

In adjustAllOopsBy:by: we rewrite ALL oops to zap the Root Bit from the  
header, then we call
adjustFieldsAndClassOf: by: to swizzle the memory location based on the  
offset.

In flushExternalPrimitives we again visit all the oops in memory and  
look for compiled methods that
have a primitiveIndexOf  equal to PrimitiveExternalCallIndex at which  
point we invoke flushExternalPrimitiveOf: to
zero out the session and address of the external primitive in the  
compiled Method.

So what can we do?

a) Because of the magic of BSD malloc a saved Squeak Macintosh image &  
Virtual machine usually ends up allocating the image at the  same  
location in the Virtual Memory address space, thus the offset is  
usually ZERO.   Therefore rewritting and fiddling
with all the oops when they don't change when you add zero to them is  
really quite pointless. I propose to change
Interpreter>>adjustFieldsAndClassOf: by: to return if the offset is  
zero.

b) Rewritting all the oops to clear the Root bit, is also tedious,  
especially since in my image of 251,816 objects I've only 1 object
that has the root bit set. I propose to only clear and write if the bit  
was set.

c) After looking again at 251,816 objects we only find 360 compile  
methods that match the criteria for flushing the session & address.
Therefore I propose to remember those oops addresses (somewhere,  
perhaps in an existing array temporarily) then spin
through them after the oops have been adjusted by possible offset and  
Root bit. If we exceed say 4096 objects we can just spin
through the data again, then penalty for the extra work is minimal.

In doing all this I can reduce flushExternalPrimitives to zero, and  
initializeObjectMemory to 6.9% removing 10% of the
startup time for your standard 3.6 image. I still chasing the rest of  
the 93% of which a majority lies in the read logic

--
======================================================================== 
===
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===



More information about the Squeak-dev mailing list