On Mon, Feb 23, 2009 at 1:49 PM, bryce@kampjes.demon.co.uk wrote:
Eliot Miranda writes:
On Sun, Feb 22, 2009 at 12:54 PM, bryce@kampjes.demon.co.uk wrote:
All you need is the optimiser to run early in compilation for it to be portable.
...and for it to be untimely. An adaptive optimizer by definition needs
to
be running intermittently all the time. It optimizes what is happening
now,
not what happened at start-up.
Exupery runs as a Smalltalk background thread, it already uses dynamic feed back to inline some primitives including #at: and #at:put.
I see only one sixth of the time going into context creation for the send benchmark which is about as send heavy as you can get. That's running native code at about twice Squeak's speed. Also there's still plenty of inefficiency in Exupery's call return sequences.
So you could get a 17% speedup if you could remove the context overhead. That's quite a tidy gain. I see a 26% increase in benchFib performance between base Squeak and the StackVM with no native code at all.
What are the inefficiences in Exupery's call return sequences?
Exupery uses a C call sequence so it's easy to enter from the interpreter,
that's a non sequitur. The Cog VM has full interoperability between interpreter and machine code without a C calling convention. The interpreter doesn't call machine code methods, instead it calls a trampoline that jumps to the right point in a machine code method, allowing the system to optimize the common case of machine-code to machine-code calls (through inline caches). The return address of a machine-code frame above an interpreter frame is that of a routine that does a return to the interpreter so that returns don't have to check for returning to the interpreter.
that C call frame is torn down when exiting each compiled method then re-created when reentering native code. That's a complete waste when going from one native method to another.
So you go to all the effort of producing native code and then you wrap it in so much gunk that you get minimal performance benefit from it. I don't understand. What are your goals? Experimenting with compilers or producing an efficient Squeak VM?
Also the send/return sequence isn't yet that optimised, there's still
plenty of inefficiencies due to lack of addressing modes etc and because it's fairly naive translation of the interpreters send code.
17% would be rather optimistic, some of the work required to set up a context will always be required. Temporaries will still need to be nilled out etc.
Again invalid assumptions. Do without contexts (except as a frame access abstraction). Sufficient adaptive optimization can avoid temporary initializations (e.g. by embedding information that records live ranges).
Bryce