Re: [ANN] So where's 3.2-5 for Unix?

16 Oct 2002


      Hi Goran,
...
Btw, I just can't help asking - what is the current status on the Jitter
project? I have been discussing Squeak and performance on the SWESUG
list lately and someone asked about it.
J4 kind of ran out of steam about a year ago.  I recently picked up
the pieces and ripped out and resedigned vast tracts of it to make "j5"
with the two fundamental goals being:
- a framework desgined from the ground up to do adaptive compilation based
  on run-time type feedback (previous incarnations simply tried to
  apply more aggressive versions of the interpreter's "heuristic"
  optimisations, kind of like how VW does things now); and
- zero runtime overhead due to GC complications.  (These were a
  significant source of performance loss in earlier versions due to
  lots of ridiculous indirections through "gc-safe" structures to keep
  raw oops out of native code.  Now I just remap the code cache: much
  simpler [and essentially zero-overhead ;-].)
The new framework was finished about a month ago with the non-inlining
compiler producing correct code for PPC and 386 on Unix (it comes to
about 26,000 lines of C++ in total).  Since then I've been tidying up
the code (why have four different kinds of list when one will do? ;),
making it work with the Win32 VM (with valuable initial help from
Andreas) and fixing the last of the obscure problems in
dependency/cache management and their interaction with the GC.  (A
very smart move was giving a rough copy to Andreas who immediately
started hammering on it hard: the first thing he tried was adding inst
vars to Morph. ;)  Right now it feels pretty solid, much more so than
any of the previous incarnations.  (I trust it implicitly with my
working image.)
I was hoping to have the simple inlining compiler working in time for
OOPSLA (since that would be the point at which sigificant performance
gains would become evident) but I'm not sure if other responsibilities
are going to allow the time for that to happen.  (Amonst other things,
it's probably time I started thinking about writing my OOPSLA workshop
presentation. ;)
Performance of the non-inlining compiler is currently around the same,
or a few 10s of % better than, the regular interpreter.  The primary
goal of the NIC is to minimise the compilation time of "instrumented"
native code, rather than reduce the execution time of that code.
(Pauses due to dynamic compilation are currently undetectable.)
But one has to understand the context of this "parity" performance.
The NIC is performing _no_ optimisation whatsoever, and is _ignoring_
all of the tricks that the bytecode interpreter does (and that earlier
jitters did).  E.g., the NIC has no notion whatsoever of "special" or
"arithmetic" sends.  Absolutely every message really is sent (even for
arthmetic, Point accessors, #value, etc.) and every send goes through
a profiling cache (which records destinations and receiver types, and
characterises the amount of polymorphism at each send site).  There
are no global method or `at' caches either.  (The at-cache is subsumed
into the inline caches by the optimising compilers.)  What's important
is the combination of type information that this code gathers in the
poymorphic caches, and the characteristics (primitive index, etc.) of
the corresponding destinations.  The inlining compilers then use this
information to do way better optimisation than any
heuristically-driven optimiser (based on special selector indices,
common sequences of bytecodes, or whatever) could ever do.  (A
secondary benefit is that selectors are completely insignificant.
Renaming #+ or #at: or #value incurs no performance penalty at all.)
I'm going to wallow in the current "contraction" phase for a while
longer, tidying things up here and there, before extending the IR with
the information required to do simple inlining.  (By "simple" I mean
where there are no collapsed scopes -- i.e., inlining completely only
those things [like primitives, quick responses, and trivial methods]
that we know will never need to activate and which contain no interior
synchronisation points [where an interrupt check might try to swap
process, for example].)  Once that's been bled dry I'll start thinking
about inlining nontrivial methods, along with with all the "dynamic
deoptimisation" headaches that come with it.
...
See you at OOPSLA.
Indeed!  (I even got in at the "Priceline Hyatt" for $50. :^)
Regards,
Ian