[Vm-dev] Strongtalk and Exupery
bryce at kampjes.demon.co.uk
Sat Sep 23 19:41:17 UTC 2006
The bytecode benchmark is a prime number sieve. It uses #at: and
#at:put:. The send benchmark is a simple recursive Fibonacci function.
Both are just measures of how quickly they execute, neither really
measures the actual bytecodes or sends performed. They are the old
tinyBenchmarks. I'd guess everyone ran the same code for these
I 100% agree that inlining is the right way to optimise common sends
and block execution. I'd just rather finish debugging Exupery and
getting it fully working without inlining then add inlining. Inlining
will add another case to think about when debugging. Debugging full
method inlining (1) will be much easier if the compiler is bug free
My rough long term plan is:
1.0: The minimum necessary to be useful.
3.0: SSA optimisation
A strong reason for not doing inlining in 1.0 is it will reduce scope
creep. If inlining is not in 1.0 then finishing 1.0 is more important.
I'd also not be surprised if Strongtalk is faster than Exupery for
bytecode performance. I'm guessing that Strongtalk's integer
arithmetic and #at: performance are better. Squeak uses 1 for it's
integer tag so in general it takes 3 instructions to detag then retag
and 2 clocks latency (this can be optimised often be optimised to 1
instruction and 1 clock latency). I'm guessing Strongtalk uses 0 for
it's integer tag.
Squeak uses a remembered set for it's write barrier which requires
checking if the object is in the remembered set, and checking if the
object is in new-space before adding it. Strongtalk might be using a
card marking table just requiring a single store.
Squeak stores the size of an object in one of two places. So to get
the size to range check you first need to figure out where it's
stored. I'm guessing that the size for an array is stored at a fixed
location in Strongtalk.
My assumptions about Strongtalk's object memory are based on reading
the papers from the Self project.
None of these things really matters to Squeak while it's running as an
interpreter because most of the time is spent recovering from branch
mispredicts or waiting for memory leaving plenty of time available to
hide the inefficiencies above.
One way to get around a slow compiler would be to save the code cache
beside the image. All relocation is done in Smalltalk, so doing this
shouldn't be too hard. But figuring out how get around a slow compiler
can wait until after the compiler has become useful. There are several
answers including writing a faster register allocator (2) or being the
(1) Exupery can already inline primitives. It uses primitive inlining
to optimise #at: and #at:put:. This is one reason why Exupery has
PICs. They are a way to get type information for primitive calls.
(2) Having a coalescing register allocation makes unnecessary moves
free. This is helpful to hide working on a two operand machine from
the compiler front end. There may be some work to make Exupery perform
well without it's register allocator.
More information about the Vm-dev