[Vm-dev] Interpreter>>isContextHeader: optimization
Eliot Miranda
eliot.miranda at gmail.com
Mon Feb 23 01:54:50 UTC 2009
On Sun, Feb 22, 2009 at 12:54 PM, <bryce at kampjes.demon.co.uk> wrote:
>
> Eliot Miranda writes:
> > On Sun, Feb 22, 2009 at 10:37 AM, <bryce at kampjes.demon.co.uk> wrote:
> >
> > >
> > > Eliot Miranda writes:
> > > >
> > > > But what I really think is that this is too low a level to worry
> about.
> > > > Much more important to focus on
> > > > - context to stack mapping
> > > > - in-line cacheing via a JIT
> > > > - exploiting multicore via Hydra
> > > > and beyond (e.g. speculative inlining)
> > > > than worrying about tiny micro-optimizations like this :)
> > >
> > > If you're planning on adding speculative, I assume Self style dynamic,
> > > inlining won't that reduce the value of context to stack mapping?
> >
> >
> > Not at all; in fact quite the reverse. Context to stack mapping allows
> one
> > to retain contexts while having the VM execute efficient, stack-based
> code
> > (i.e. using hardware call instructions). This in turn enables the
> entire
> > adaptive optimizer, including the stack analyser and the
> > bytecode-to-bytecode compiler/method inliner to be written in Smalltalk.
> > The image level code can examine the run-time stack using contexts as
> their
> > interface without having to understand native stack formats or different
> > ISAs. The optimizer is therefore completely portable with all machine
> > specificities confined to the underlying VM which is much simpler by
> virtue
> > of not containing a sophisticated optimizer (which one would have to
> squeeze
> > through Slang etc).
>
> All you need is the optimiser to run early in compilation for it to be
> portable.
...and for it to be untimely. An adaptive optimizer by definition needs to
be running intermittently all the time. It optimizes what is happening now,
not what happened at start-up.
And we definately agree on trying to keep complex logic out of the
> VM. Sound's like you're thinking of AoSTa.
>
yes (AOStA).
> > So for me, context-to-stack mapping is fundamental to implementing
> > speculative inlining in Smalltalk.
> >
> >
> > My view with Exupery is context caches should be left until after
> > > dynamic inlining as their value will depend on how well dynamic
> > > inlining reduces the number of sends.
> > >
> >
> > I know and I disagree. Dynamic inlining depends on collecting good type
> > information, something that inline caches do well. In-line caches are
> > efficiently implemented with native call instructions, either to method
> > entry-points or PIC jump tables. Native call instructions mesh well
> with
> > stacks. So context-to-stack mapping, for me, is a sensible enabling
> > optimization for speculative inlining because it meshes well with inline
> > caches.
>
> PICs are a separate issue. Exupery has PICs, and has had them for
> years now. PICs are just as easily implemented as jumps.
>
Yes, PICs are jump tables. But, at least in my implementation and in others
I know of, they get called. Tey are composed of a jump table that then
jumps into methods at a point past any entry-point dynamic-binding/type
checking.
> > Further, context-to-stack mapping is such a huge win that it'll be of
> > benefit even if the VM is spending 90% of its time in inlined call-less
> > code. We see a speedup of very nearly 2x (48% sticks in my head) for
> one
> > non-micro tree walking benchmark from the computer language shootout.
> And
> > this is in a very slow VM. In a faster VM context-to-stack mapping
> would be
> > even more valuable, because it would save an even greater percentage of
> > overall execution time.
>
> I see only one sixth of the time going into context creation for the
> send benchmark which is about as send heavy as you can get. That's
> running native code at about twice Squeak's speed. Also there's still
> plenty of inefficiency in Exupery's call return sequences.
So you could get a 17% speedup if you could remove the context overhead.
That's quite a tidy gain. I see a 26% increase in benchFib performance
between base Squeak and the StackVM with no native code at all.
What are the inefficiences in Exupery's call return sequences?
> Further still using call & return instructions as conventionally as
> possible
> > meshes extremely well with current processor implementations which,
> because
> > of the extensive use thereon of conventional stack-oriented language
> > implementations, have done a great job optimizing call/return.
>
> Unconditional jumps for sends also benefit from hardware
> optimisation. Returns turn into indirect jumps which are less
> efficent, but getting better with Core 2.
and Power
>
>
> Cheers
> Bryce
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20090222/3f58b517/attachment-0001.htm
More information about the Vm-dev
mailing list