On Mon, Feb 23, 2009 at 1:49 PM, <bryce@kampjes.demon.co.uk> wrote:

Eliot Miranda writes:
> On Sun, Feb 22, 2009 at 12:54 PM, <bryce@kampjes.demon.co.uk> wrote:

> > All you need is the optimiser to run early in compilation for it to be
> > portable.
>
>
> ...and for it to be untimely. An adaptive optimizer by definition needs to
> be running intermittently all the time. It optimizes what is happening now,
> not what happened at start-up.

Exupery runs as a Smalltalk background thread, it already uses dynamic
feed back to inline some primitives including #at: and #at:put.

> > I see only one sixth of the time going into context creation for the
> > send benchmark which is about as send heavy as you can get. That's
> > running native code at about twice Squeak's speed. Also there's still
> > plenty of inefficiency in Exupery's call return sequences.
>
>
> So you could get a 17% speedup if you could remove the context overhead.
> That's quite a tidy gain. I see a 26% increase in benchFib performance
> between base Squeak and the StackVM with no native code at all.
>
> What are the inefficiences in Exupery's call return sequences?

Exupery uses a C call sequence so it's easy to enter from the
interpreter,

that's a non sequitur. The Cog VM has full interoperability between interpreter and machine code without a C calling convention. The interpreter doesn't call machine code methods, instead it calls a trampoline that jumps to the right point in a machine code method, allowing the system to optimize the common case of machine-code to machine-code calls (through inline caches). The return address of a machine-code frame above an interpreter frame is that of a routine that does a return to the interpreter so that returns don't have to check for returning to the interpreter.

that C call frame is torn down when exiting each
compiled method then re-created when reentering native code. That's
a complete waste when going from one native method to another.

So you go to all the effort of producing native code and then you wrap it in so much gunk that you get minimal performance benefit from it. I don't understand. What are your goals? Experimenting with compilers or producing an efficient Squeak VM?

Also the send/return sequence isn't yet that optimised, there's still
plenty of inefficiencies due to lack of addressing modes etc and because
it's fairly naive translation of the interpreters send code.

17% would be rather optimistic, some of the work required to set up a
context will always be required. Temporaries will still need to be
nilled out etc.

Again invalid assumptions. Do without contexts (except as a frame access abstraction). Sufficient adaptive optimization can avoid temporary initializations (e.g. by embedding information that records live ranges).

Bryce