Adding a new imediate type
Andreas Raab
andreas.raab at gmx.de
Fri Jan 6 00:32:01 UTC 2006
Bryce Kampjes wrote:
> Bert Freudenberg writes:
> > But Points *do* have a special primitive! Number 18?
>
> True but it still calls
>
> instantiateSmallClass:sizeInBytes:fill:.
>
> All it needs to do, is grab some memory then populate it. That's a
> fairly short sequence, so long as a GC isn't necessary. The speed gain
> will be from specialising the entire primitive for the case where a GC
> isn't needed. It's more code, and more special cases, but still
> probably easier than creating new intermediates.
Well, an interesting test would be to change primitiveNew and friends to
clone prototypes instead of going through the full allocations (e.g.,
use a prototype cache where we store prototype per class). I believe
that primitiveClone is the fastest way of creating a new object because
it does exactly what you say - grab memory and copy some values. So if
there is any difference this should make it noticable.
> My interest is optimising compiled code with an inlined
> primitive. There I'm hoping that allocation should take about 10-20
> clocks if a GC isn't needed. For a handful of critical types, a
> similar optimisation may pay off for the interpreter.
To be honest - I don't expect this to make any significant difference.
Simply because allocation speed is only a fraction of the cost of even a
loop like:
1 to: 1000000 do:[:i| 0 at 0]
If you actually measure that you'll find that about half of the time is
spent in GC and that's for the *optimal* (and completely unrelatistic)
case of zero survivors (meaning GC is a lot faster than usual) and not
even counting the loop overhead. If you actually measure the loop
overhead (which creates no GCs) and subtract that from the tally results
you end up with something that says that more than 80% of the remaining
time (which is due to actual allocation) is spent in garbage collection.
In other words, even reducing allocation costs to zero means you would
probably only get 10% in a benchmark like the above. In reality (e.g.,
with some actual load on the garbage collector so that both mark phase
and compaction have some work to do) you'll be closer to 5% I think. And
that'd be for *zero* allocation costs, mind you!
So in any case, I'd really like to see a few benchmarks. A theoretical
discussion is only useful up to a point and for me, that point is
reached by now ;-)
Cheers,
- Andreas
More information about the Squeak-dev
mailing list
|