[squeak-dev] Cog VM status update

Wed Dec 17 09:58:41 UTC 2008

Hi Andreas,

thanks for sharing. This was really informative and interesting.
>From the reading is sounds damn impressive.

Norbert

On Tue, 2008-12-16 at 19:30 -0800, Andreas Raab wrote:
> Folks -
> 
> I just read Eliot's most recent blog post about the Cog VM and it 
> reminded me how difficult it must be for others to see where this 
> project stands. So here is a bit of an update on the current status:
> 
> As you may recall we started this project in spring this year by hiring 
> Eliot for the express purpose of building us a new VM that would speed 
> up execution of our products. We decided to structure the work into 
> stages at the end of each there would be a tangible deliverable i.e., a 
> new VM that could be run and benchmarked.
> 
> The first stage in this process is what we call the "Closure VM". It is 
> nothing more (and nothing less) than a Squeak VM with closures and the 
> required support (compiler, decompiler, debugger etc). Given past 
> experience, we had originally expected this stage to cost us some speed 
> (up to 20% were estimated) since closure support has a cost which at 
> that stage is hard to offset with other improvements. However, thanks to 
> a truly ingenious bit of engineering done by Eliot in the design for the 
> closure implementation the resulting speed difference was negligible. 
> Since there was no speed penalty we decided to jump ship earlier than we 
> originally anticipated and the Closure VM has been the regular shipping 
> VM with Qwaq products since September this year.
> 
> The second stage in the process is the "Stack VM". It is a Closure VM 
> that executes on the native stack and transparently maps contexts from 
> and to stack frames as required. The VM itself is still an interpreter 
> so any speed improvements come purely from the more efficient 
> organization of the stack layout (no allocations, overlapping frames 
> etc). For those of you having been around for long enough it is 
> equivalent to what Anthony Hanan did a few years ago, except that it 
> hides the existence of the native stack entirely and gives the 
> programmer the naive view of just dealing with linked frames (contexts). 
> The original expectations for the resulting speedups by Eliot were a 
> little higher than we've seen in practice, but are in line with the 
> results that Anthony got: approx. 30% improvements across the board in 
> macro benchmarks. The work on the Stack VM was completed last month, we 
> are currently rolling it out internally and the next product release 
> will ship with the Stack VM.
> 
> The third stage which has just begun is what we call the "Simple JIT VM" 
> (well, really it doesn't have a name yet, I just made it up ;-) Its 
> focus is send performance (as we see send performance as the single 
> biggest current bottleneck). It will sport a very simple JIT w/ inline 
> caches with the idea being to bring up send performance to the point 
> where it's no longer the single biggest bottleneck, then measure 
> performance again and figure out what the next best target is. I am not 
> going to speculate on performance (we have been wrong every single step 
> of the way ;-) but both Eliot and I do think that we'll see some nice 
> improvements in application performance here.
> 
> The fourth stage is a bit more speculation at this point because the 
> concrete direction depends on what the results of stage 3 really show 
> the new bottleneck to be. We have various candidates lined up: Very high 
> on the list is a delayed code generator which can dramatically improve 
> the code quality. Next to it are changes in the object format moving to 
> a unified 32/64bit header model which would dramatically simplify some 
> tests for inline caching and primitives etc. However, since this work is 
> driven by product performance, it is possible (albeit unlikely at this 
> point) that the focus might shift towards FFI speed or float inlining. 
> There is no shortage of possible directions, the main issue will be to 
> figure out what the bottlenecks at that point are and how to address 
> them most efficiently.
> 
> Stage four won't be the end of it, but from where we are this is how far 
> we've planned at this point. And if you want to know all the gory 
> details about the stuff that Eliot's working on, please do check out his 
> blog at:
> 
>    http://cogblog.mirandabanda.org/
> 
> Cheers,
>    - Andreas
> 
> 
>