[Vm-dev] Re: [squeak-dev] Short-circuiting comparisons (booleanCheat) via special selectors [Was Using #= for integer comparison instead of #==]

Sat Nov 20 17:32:03 UTC 2010

On Sat, Nov 20, 2010 at 8:19 AM, Eliot Miranda <eliot.miranda at gmail.com> wrote:

> Yes, I agree. But I don't think that AST vs bytecode is really anything to do
with it; they can be easily transformed into each other (via decompiler &
compiler). The bytecode is a convenient form because it is compact and can
efficiently be interpreted. The issue is *when* and *where* to spend the
cycles trying to optimise aggressively. That's where performance counters come
in. If one decorates the jitted code with e.g. a taken and untaken count at
each conditional branch then when these counters trip one suspends execution,
examines the current call stack, collecting concrete type information from
inline caches, and optimises several nested activations into a single large
method that is worth optimising with traditional static techniques (good
register allocation etc). If one tries to optimise everything the system
becomes unresponsive (see Craig Chambers' Self 2 compiler). If one defers
optimization until finding a "hot spot" things work much better (see Urs
Höltzle's Self 3 compiler, HotSpot et al).

> So

> - keep bytecode and an interpreter for compactness, portability and the
   ability to always fall back on the interpreter (e.g. when the JIT runs out
   of memory during some tricky relinking operation)

> - use a simple JIT to optimize code run more than once, that does a
   reasonable job of stack to register mapping, implements PICs to collect
   type info and performance counters to collect block usage and invoke the
   aggressive optimizer

> - use a speculative inliner and an aggressive optimiser to inline code based
   on hot spots, basic block counts, and PIC info, and optimize it using
   traditional techniques. > All of the above exists in various production
   VMs, AFAIA none all in the same place. So the above is arguably a proven
   architecture. Hence it is my direction (and Marcus' and I hope yours too).
   I can send you my architectural sketch if you're interested.

That would seem to describe Strongtalk pretty well, no? Or does it collect
type and profile information directly from the interpreter?

Also, my impression was that Strongtalk, Hotspot et al did inlining *down* the
stack. That is, it would find a method that was activated frequently, then
inline whatever sends and block activations it performed to get a large,
statically optimizable method.

What you mention above is slightly different, IIUC, in that you're
finding a hot spot
based on counters in basic blocks, then looking up the stack to find a method
that can be aggressively optimized. This sounds a bit like the
trace-based inlining
that Mozilla used for their Javascript, in that you're effectively
choosing to optimize
a particular path through the code rather than a particular method. Thoughts?

Colin