Eliot Miranda writes:
On Sun, Feb 22, 2009 at 10:37 AM, bryce@kampjes.demon.co.uk wrote:
Eliot Miranda writes:
But what I really think is that this is too low a level to worry about. Much more important to focus on
- context to stack mapping
- in-line cacheing via a JIT
- exploiting multicore via Hydra
and beyond (e.g. speculative inlining) than worrying about tiny micro-optimizations like this :)
If you're planning on adding speculative, I assume Self style dynamic, inlining won't that reduce the value of context to stack mapping?
Not at all; in fact quite the reverse. Context to stack mapping allows one to retain contexts while having the VM execute efficient, stack-based code (i.e. using hardware call instructions). This in turn enables the entire adaptive optimizer, including the stack analyser and the bytecode-to-bytecode compiler/method inliner to be written in Smalltalk. The image level code can examine the run-time stack using contexts as their interface without having to understand native stack formats or different ISAs. The optimizer is therefore completely portable with all machine specificities confined to the underlying VM which is much simpler by virtue of not containing a sophisticated optimizer (which one would have to squeeze through Slang etc).
All you need is the optimiser to run early in compilation for it to be portable.
And we definately agree on trying to keep complex logic out of the VM. Sound's like you're thinking of AoSTa.
So for me, context-to-stack mapping is fundamental to implementing speculative inlining in Smalltalk.
My view with Exupery is context caches should be left until after
dynamic inlining as their value will depend on how well dynamic inlining reduces the number of sends.
I know and I disagree. Dynamic inlining depends on collecting good type information, something that inline caches do well. In-line caches are efficiently implemented with native call instructions, either to method entry-points or PIC jump tables. Native call instructions mesh well with stacks. So context-to-stack mapping, for me, is a sensible enabling optimization for speculative inlining because it meshes well with inline caches.
PICs are a separate issue. Exupery has PICs, and has had them for years now. PICs are just as easily implemented as jumps.
Further, context-to-stack mapping is such a huge win that it'll be of benefit even if the VM is spending 90% of its time in inlined call-less code. We see a speedup of very nearly 2x (48% sticks in my head) for one non-micro tree walking benchmark from the computer language shootout. And this is in a very slow VM. In a faster VM context-to-stack mapping would be even more valuable, because it would save an even greater percentage of overall execution time.
I see only one sixth of the time going into context creation for the send benchmark which is about as send heavy as you can get. That's running native code at about twice Squeak's speed. Also there's still plenty of inefficiency in Exupery's call return sequences.
Further still using call & return instructions as conventionally as possible meshes extremely well with current processor implementations which, because of the extensive use thereon of conventional stack-oriented language implementations, have done a great job optimizing call/return.
Unconditional jumps for sends also benefit from hardware optimisation. Returns turn into indirect jumps which are less efficent, but getting better with Core 2.
Cheers Bryce