Adding a new imediate type

Andreas Raab andreas.raab at gmx.de
Fri Jan 6 00:32:01 UTC 2006


Bryce Kampjes wrote:
> Bert Freudenberg writes:
>  > But Points *do* have a special primitive! Number 18?
> 
> True but it still calls
> 
>   instantiateSmallClass:sizeInBytes:fill:.
> 
> All it needs to do, is grab some memory then populate it. That's a
> fairly short sequence, so long as a GC isn't necessary. The speed gain
> will be from specialising the entire primitive for the case where a GC
> isn't needed. It's more code, and more special cases, but still
> probably easier than creating new intermediates.

Well, an interesting test would be to change primitiveNew and friends to 
clone prototypes instead of going through the full allocations (e.g., 
use a prototype cache where we store prototype per class). I believe 
that primitiveClone is the fastest way of creating a new object because 
it does exactly what you say - grab memory and copy some values. So if 
there is any difference this should make it noticable.

> My interest is optimising compiled code with an inlined
> primitive. There I'm hoping that allocation should take about 10-20
> clocks if a GC isn't needed. For a handful of critical types, a
> similar optimisation may pay off for the interpreter.

To be honest - I don't expect this to make any significant difference. 
Simply because allocation speed is only a fraction of the cost of even a 
loop like:
	1 to: 1000000 do:[:i| 0 at 0]
If you actually measure that you'll find that about half of the time is 
spent in GC and that's for the *optimal* (and completely unrelatistic) 
case of zero survivors (meaning GC is a lot faster than usual) and not 
even counting the loop overhead. If you actually measure the loop 
overhead (which creates no GCs) and subtract that from the tally results 
you end up with something that says that more than 80% of the remaining 
time (which is due to actual allocation) is spent in garbage collection.

In other words, even reducing allocation costs to zero means you would 
probably only get 10% in a benchmark like the above. In reality (e.g., 
with some actual load on the garbage collector so that both mark phase 
and compaction have some work to do) you'll be closer to 5% I think. And 
that'd be for *zero* allocation costs, mind you!

So in any case, I'd really like to see a few benchmarks. A theoretical 
discussion is only useful up to a point and for me, that point is 
reached by now ;-)

Cheers,
   - Andreas



More information about the Squeak-dev mailing list