Hi Goran,
Btw, I just can't help asking - what is the current status on the Jitter project? I have been discussing Squeak and performance on the SWESUG list lately and someone asked about it.
J4 kind of ran out of steam about a year ago. I recently picked up the pieces and ripped out and resedigned vast tracts of it to make "j5" with the two fundamental goals being:
- a framework desgined from the ground up to do adaptive compilation based on run-time type feedback (previous incarnations simply tried to apply more aggressive versions of the interpreter's "heuristic" optimisations, kind of like how VW does things now); and
- zero runtime overhead due to GC complications. (These were a significant source of performance loss in earlier versions due to lots of ridiculous indirections through "gc-safe" structures to keep raw oops out of native code. Now I just remap the code cache: much simpler [and essentially zero-overhead ;-].)
The new framework was finished about a month ago with the non-inlining compiler producing correct code for PPC and 386 on Unix (it comes to about 26,000 lines of C++ in total). Since then I've been tidying up the code (why have four different kinds of list when one will do? ;), making it work with the Win32 VM (with valuable initial help from Andreas) and fixing the last of the obscure problems in dependency/cache management and their interaction with the GC. (A very smart move was giving a rough copy to Andreas who immediately started hammering on it hard: the first thing he tried was adding inst vars to Morph. ;) Right now it feels pretty solid, much more so than any of the previous incarnations. (I trust it implicitly with my working image.)
I was hoping to have the simple inlining compiler working in time for OOPSLA (since that would be the point at which sigificant performance gains would become evident) but I'm not sure if other responsibilities are going to allow the time for that to happen. (Amonst other things, it's probably time I started thinking about writing my OOPSLA workshop presentation. ;)
Performance of the non-inlining compiler is currently around the same, or a few 10s of % better than, the regular interpreter. The primary goal of the NIC is to minimise the compilation time of "instrumented" native code, rather than reduce the execution time of that code. (Pauses due to dynamic compilation are currently undetectable.)
But one has to understand the context of this "parity" performance. The NIC is performing _no_ optimisation whatsoever, and is _ignoring_ all of the tricks that the bytecode interpreter does (and that earlier jitters did). E.g., the NIC has no notion whatsoever of "special" or "arithmetic" sends. Absolutely every message really is sent (even for arthmetic, Point accessors, #value, etc.) and every send goes through a profiling cache (which records destinations and receiver types, and characterises the amount of polymorphism at each send site). There are no global method or `at' caches either. (The at-cache is subsumed into the inline caches by the optimising compilers.) What's important is the combination of type information that this code gathers in the poymorphic caches, and the characteristics (primitive index, etc.) of the corresponding destinations. The inlining compilers then use this information to do way better optimisation than any heuristically-driven optimiser (based on special selector indices, common sequences of bytecodes, or whatever) could ever do. (A secondary benefit is that selectors are completely insignificant. Renaming #+ or #at: or #value incurs no performance penalty at all.)
I'm going to wallow in the current "contraction" phase for a while longer, tidying things up here and there, before extending the IR with the information required to do simple inlining. (By "simple" I mean where there are no collapsed scopes -- i.e., inlining completely only those things [like primitives, quick responses, and trivial methods] that we know will never need to activate and which contain no interior synchronisation points [where an interrupt check might try to swap process, for example].) Once that's been bled dry I'll start thinking about inlining nontrivial methods, along with with all the "dynamic deoptimisation" headaches that come with it.
See you at OOPSLA.
Indeed! (I even got in at the "Priceline Hyatt" for $50. :^)
Regards,
Ian