Re: [Vm-dev] Interpreter>>isContextHeader: optimization

23 Feb 2009

      On Sun, Feb 22, 2009 at 12:54 PM, bryce@kampjes.demon.co.uk wrote:
...
Eliot Miranda writes:
...
On Sun, Feb 22, 2009 at 10:37 AM, bryce@kampjes.demon.co.uk wrote:
...
Eliot Miranda writes:
...
But what I really think is that this is too low a level to worry
about.
...
...
...
Much more important to focus on

context to stack mapping
in-line cacheing via a JIT
exploiting multicore via Hydra

and beyond (e.g. speculative inlining)
than worrying about tiny micro-optimizations like this :)
If you're planning on adding speculative, I assume Self style dynamic,
inlining won't that reduce the value of context to stack mapping?
Not at all; in fact quite the reverse.  Context to stack mapping allows
one
...
to retain contexts while having the VM execute efficient, stack-based
code
...
(i.e. using hardware call instructions).  This in turn enables the
entire
...
adaptive optimizer, including the stack analyser and the
bytecode-to-bytecode compiler/method inliner to be written in Smalltalk.
 The image level code can examine the run-time stack using contexts as
their
...
interface without having to understand native stack formats or different
ISAs.  The optimizer is therefore completely portable with all machine
specificities confined to the underlying VM which is much simpler by
virtue
...
of not containing a sophisticated optimizer (which one would have to
squeeze
...
through Slang etc).
All you need is the optimiser to run early in compilation for it to be
portable.
...and for it to be untimely.  An adaptive optimizer by definition needs to
be running intermittently all the time.  It optimizes what is happening now,
not what happened at start-up.
And we definately agree on trying to keep complex logic out of the
...
VM. Sound's like you're thinking of AoSTa.
yes (AOStA).
...
...
So for me, context-to-stack mapping is fundamental to implementing
speculative inlining in Smalltalk.
My view with Exupery is context caches should be left until after
...
dynamic inlining as their value will depend on how well dynamic
inlining reduces the number of sends.
I know and I disagree.  Dynamic inlining depends on collecting good type
information, something that inline caches do well.  In-line caches are
efficiently implemented with native call instructions, either to method
entry-points or PIC jump tables.  Native call instructions mesh well
with
...
stacks.  So context-to-stack mapping, for me, is a sensible enabling
optimization for speculative inlining because it meshes well with inline
caches.
PICs are a separate issue. Exupery has PICs, and has had them for
years now. PICs are just as easily implemented as jumps.
Yes, PICs are jump tables.  But, at least in my implementation and in others
I know of, they get called.  Tey are composed of a jump table that then
jumps into methods at a point past any entry-point dynamic-binding/type
checking.
...
...
Further, context-to-stack mapping is such a huge win that it'll be of
benefit even if the VM is spending 90% of its time in inlined call-less
code.  We see a speedup of very nearly 2x (48% sticks in my head) for
one
...
non-micro tree walking benchmark from the computer language shootout.
And
...
this is in a very slow VM.  In a faster VM context-to-stack mapping
would be
...
even more valuable, because it would save an even greater percentage of
overall execution time.
I see only one sixth of the time going into context creation for the
send benchmark which is about as send heavy as you can get. That's
running native code at about twice Squeak's speed. Also there's still
plenty of inefficiency in Exupery's call return sequences.
So you could get a 17% speedup if you could remove the context overhead.
 That's quite a tidy gain.  I see a 26% increase in benchFib performance
between base Squeak and the StackVM with no native code at all.
What are the inefficiences in Exupery's call return sequences?
...
Further still using call & return instructions as conventionally as
possible
...
meshes extremely well with current processor implementations which,
because
...
of the extensive use thereon of conventional stack-oriented language
implementations, have done a great job optimizing call/return.
Unconditional jumps for sends also benefit from hardware
optimisation. Returns turn into indirect jumps which are less
efficent, but getting better with Core 2.
and Power
...
Cheers
Bryce

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Vm-dev] Interpreter>>isContextHeader: optimization