[Vm-dev] Interpreter>>isContextHeader: optimization

Mon Feb 23 01:54:50 UTC 2009

On Sun, Feb 22, 2009 at 12:54 PM, <bryce at kampjes.demon.co.uk> wrote:

>
> Eliot Miranda writes:
>  >  On Sun, Feb 22, 2009 at 10:37 AM, <bryce at kampjes.demon.co.uk> wrote:
>  >
>  > >
>  > > Eliot Miranda writes:
>  > >  >
>  > >  > But what I really think is that this is too low a level to worry
> about.
>  > >  >  Much more important to focus on
>  > >  > - context to stack mapping
>  > >  > - in-line cacheing via a JIT
>  > >  > - exploiting multicore via Hydra
>  > >  > and beyond (e.g. speculative inlining)
>  > >  > than worrying about tiny micro-optimizations like this :)
>  > >
>  > > If you're planning on adding speculative, I assume Self style dynamic,
>  > > inlining won't that reduce the value of context to stack mapping?
>  >
>  >
>  > Not at all; in fact quite the reverse.  Context to stack mapping allows
> one
>  > to retain contexts while having the VM execute efficient, stack-based
> code
>  > (i.e. using hardware call instructions).  This in turn enables the
> entire
>  > adaptive optimizer, including the stack analyser and the
>  > bytecode-to-bytecode compiler/method inliner to be written in Smalltalk.
>  >  The image level code can examine the run-time stack using contexts as
> their
>  > interface without having to understand native stack formats or different
>  > ISAs.  The optimizer is therefore completely portable with all machine
>  > specificities confined to the underlying VM which is much simpler by
> virtue
>  > of not containing a sophisticated optimizer (which one would have to
> squeeze
>  > through Slang etc).
>
> All you need is the optimiser to run early in compilation for it to be
> portable.

...and for it to be untimely.  An adaptive optimizer by definition needs to
be running intermittently all the time.  It optimizes what is happening now,
not what happened at start-up.

And we definately agree on trying to keep complex logic out of the
> VM. Sound's like you're thinking of AoSTa.
>

yes (AOStA).

>  > So for me, context-to-stack mapping is fundamental to implementing
>  > speculative inlining in Smalltalk.
>  >
>  >
>  > My view with Exupery is context caches should be left until after
>  > > dynamic inlining as their value will depend on how well dynamic
>  > > inlining reduces the number of sends.
>  > >
>  >
>  > I know and I disagree.  Dynamic inlining depends on collecting good type
>  > information, something that inline caches do well.  In-line caches are
>  > efficiently implemented with native call instructions, either to method
>  > entry-points or PIC jump tables.  Native call instructions mesh well
> with
>  > stacks.  So context-to-stack mapping, for me, is a sensible enabling
>  > optimization for speculative inlining because it meshes well with inline
>  > caches.
>
> PICs are a separate issue. Exupery has PICs, and has had them for
> years now. PICs are just as easily implemented as jumps.
>

Yes, PICs are jump tables.  But, at least in my implementation and in others
I know of, they get called.  Tey are composed of a jump table that then
jumps into methods at a point past any entry-point dynamic-binding/type
checking.

>  > Further, context-to-stack mapping is such a huge win that it'll be of
>  > benefit even if the VM is spending 90% of its time in inlined call-less
>  > code.  We see a speedup of very nearly 2x (48% sticks in my head) for
> one
>  > non-micro tree walking benchmark from the computer language shootout.
>  And
>  > this is in a very slow VM.  In a faster VM context-to-stack mapping
> would be
>  > even more valuable, because it would save an even greater percentage of
>  > overall execution time.
>
> I see only one sixth of the time going into context creation for the
> send benchmark which is about as send heavy as you can get. That's
> running native code at about twice Squeak's speed. Also there's still
> plenty of inefficiency in Exupery's call return sequences.

So you could get a 17% speedup if you could remove the context overhead.
 That's quite a tidy gain.  I see a 26% increase in benchFib performance
between base Squeak and the StackVM with no native code at all.

What are the inefficiences in Exupery's call return sequences?

 > Further still using call & return instructions as conventionally as
> possible
>  > meshes extremely well with current processor implementations which,
> because
>  > of the extensive use thereon of conventional stack-oriented language
>  > implementations, have done a great job optimizing call/return.
>
> Unconditional jumps for sends also benefit from hardware
> optimisation. Returns turn into indirect jumps which are less
> efficent, but getting better with Core 2.

and Power

>
>
> Cheers
> Bryce
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20090222/3f58b517/attachment-0001.htm