[Vm-dev] Interpreter>>isContextHeader: optimization

Sun Feb 22 20:11:53 UTC 2009

On Sun, Feb 22, 2009 at 10:37 AM, <bryce at kampjes.demon.co.uk> wrote:

>
> Eliot Miranda writes:
>  >
>  > But what I really think is that this is too low a level to worry about.
>  >  Much more important to focus on
>  > - context to stack mapping
>  > - in-line cacheing via a JIT
>  > - exploiting multicore via Hydra
>  > and beyond (e.g. speculative inlining)
>  > than worrying about tiny micro-optimizations like this :)
>
> If you're planning on adding speculative, I assume Self style dynamic,
> inlining won't that reduce the value of context to stack mapping?

Not at all; in fact quite the reverse.  Context to stack mapping allows one
to retain contexts while having the VM execute efficient, stack-based code
(i.e. using hardware call instructions).  This in turn enables the entire
adaptive optimizer, including the stack analyser and the
bytecode-to-bytecode compiler/method inliner to be written in Smalltalk.
 The image level code can examine the run-time stack using contexts as their
interface without having to understand native stack formats or different
ISAs.  The optimizer is therefore completely portable with all machine
specificities confined to the underlying VM which is much simpler by virtue
of not containing a sophisticated optimizer (which one would have to squeeze
through Slang etc).

So for me, context-to-stack mapping is fundamental to implementing
speculative inlining in Smalltalk.

My view with Exupery is context caches should be left until after
> dynamic inlining as their value will depend on how well dynamic
> inlining reduces the number of sends.
>

I know and I disagree.  Dynamic inlining depends on collecting good type
information, something that inline caches do well.  In-line caches are
efficiently implemented with native call instructions, either to method
entry-points or PIC jump tables.  Native call instructions mesh well with
stacks.  So context-to-stack mapping, for me, is a sensible enabling
optimization for speculative inlining because it meshes well with inline
caches.

Further, context-to-stack mapping is such a huge win that it'll be of
benefit even if the VM is spending 90% of its time in inlined call-less
code.  We see a speedup of very nearly 2x (48% sticks in my head) for one
non-micro tree walking benchmark from the computer language shootout.  And
this is in a very slow VM.  In a faster VM context-to-stack mapping would be
even more valuable, because it would save an even greater percentage of
overall execution time.

Further still using call & return instructions as conventionally as possible
meshes extremely well with current processor implementations which, because
of the extensive use thereon of conventional stack-oriented language
implementations, have done a great job optimizing call/return.

Further still, the current performance of call/return on contemporary
processors, specifically prefetch across call & return (prefetch across
return only possible if one sticks to the processor's expected stack
organization of return addresses) renders call/return performance the same
as jumps.  So the benefits of inlining are no longer in eliminating
call/return, but rather in eliminating dispatch, argument copying, etc.  So
inlining per se isn't of benefit.  It can actually worsen instruction cache
density. Analysis and elimination of dispatch is.  So again context-to-stack
mapping makes sense because it means the speculative inliner/adaptive
optimizer doesn't have to focus on creating humongous methods or inlining
accessos etc etc, and can focus on higher level optimizations like block
removal (lambda lifting?), common subexpression elimination, and so on.

best
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20090222/a16bd807/attachment.htm