[squeak-dev] Cog VM status update
andreas.raab at gmx.de
Wed Dec 17 03:30:30 UTC 2008
I just read Eliot's most recent blog post about the Cog VM and it
reminded me how difficult it must be for others to see where this
project stands. So here is a bit of an update on the current status:
As you may recall we started this project in spring this year by hiring
Eliot for the express purpose of building us a new VM that would speed
up execution of our products. We decided to structure the work into
stages at the end of each there would be a tangible deliverable i.e., a
new VM that could be run and benchmarked.
The first stage in this process is what we call the "Closure VM". It is
nothing more (and nothing less) than a Squeak VM with closures and the
required support (compiler, decompiler, debugger etc). Given past
experience, we had originally expected this stage to cost us some speed
(up to 20% were estimated) since closure support has a cost which at
that stage is hard to offset with other improvements. However, thanks to
a truly ingenious bit of engineering done by Eliot in the design for the
closure implementation the resulting speed difference was negligible.
Since there was no speed penalty we decided to jump ship earlier than we
originally anticipated and the Closure VM has been the regular shipping
VM with Qwaq products since September this year.
The second stage in the process is the "Stack VM". It is a Closure VM
that executes on the native stack and transparently maps contexts from
and to stack frames as required. The VM itself is still an interpreter
so any speed improvements come purely from the more efficient
organization of the stack layout (no allocations, overlapping frames
etc). For those of you having been around for long enough it is
equivalent to what Anthony Hanan did a few years ago, except that it
hides the existence of the native stack entirely and gives the
programmer the naive view of just dealing with linked frames (contexts).
The original expectations for the resulting speedups by Eliot were a
little higher than we've seen in practice, but are in line with the
results that Anthony got: approx. 30% improvements across the board in
macro benchmarks. The work on the Stack VM was completed last month, we
are currently rolling it out internally and the next product release
will ship with the Stack VM.
The third stage which has just begun is what we call the "Simple JIT VM"
(well, really it doesn't have a name yet, I just made it up ;-) Its
focus is send performance (as we see send performance as the single
biggest current bottleneck). It will sport a very simple JIT w/ inline
caches with the idea being to bring up send performance to the point
where it's no longer the single biggest bottleneck, then measure
performance again and figure out what the next best target is. I am not
going to speculate on performance (we have been wrong every single step
of the way ;-) but both Eliot and I do think that we'll see some nice
improvements in application performance here.
The fourth stage is a bit more speculation at this point because the
concrete direction depends on what the results of stage 3 really show
the new bottleneck to be. We have various candidates lined up: Very high
on the list is a delayed code generator which can dramatically improve
the code quality. Next to it are changes in the object format moving to
a unified 32/64bit header model which would dramatically simplify some
tests for inline caching and primitives etc. However, since this work is
driven by product performance, it is possible (albeit unlikely at this
point) that the focus might shift towards FFI speed or float inlining.
There is no shortage of possible directions, the main issue will be to
figure out what the bottlenecks at that point are and how to address
them most efficiently.
Stage four won't be the end of it, but from where we are this is how far
we've planned at this point. And if you want to know all the gory
details about the stuff that Eliot's working on, please do check out his
More information about the Squeak-dev