<br><br><div class="gmail_quote">On Mon, Feb 23, 2009 at 1:49 PM, <span dir="ltr"><<a href="mailto:bryce@kampjes.demon.co.uk">bryce@kampjes.demon.co.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="Ih2E3d"><br>
Eliot Miranda writes:<br>
> On Sun, Feb 22, 2009 at 12:54 PM, <<a href="mailto:bryce@kampjes.demon.co.uk">bryce@kampjes.demon.co.uk</a>> wrote:<br>
</div><div class="Ih2E3d"> > > All you need is the optimiser to run early in compilation for it to be<br>
> > portable.<br>
><br>
><br>
> ...and for it to be untimely. An adaptive optimizer by definition needs to<br>
> be running intermittently all the time. It optimizes what is happening now,<br>
> not what happened at start-up.<br>
<br>
</div>Exupery runs as a Smalltalk background thread, it already uses dynamic<br>
feed back to inline some primitives including #at: and #at:put.<br>
<div class="Ih2E3d"><br>
> > I see only one sixth of the time going into context creation for the<br>
> > send benchmark which is about as send heavy as you can get. That's<br>
> > running native code at about twice Squeak's speed. Also there's still<br>
> > plenty of inefficiency in Exupery's call return sequences.<br>
><br>
><br>
> So you could get a 17% speedup if you could remove the context overhead.<br>
> That's quite a tidy gain. I see a 26% increase in benchFib performance<br>
> between base Squeak and the StackVM with no native code at all.<br>
><br>
> What are the inefficiences in Exupery's call return sequences?<br>
<br>
</div>Exupery uses a C call sequence so it's easy to enter from the<br>
interpreter,</blockquote><div><br></div><div>that's a non sequitur. The Cog VM has full interoperability between interpreter and machine code without a C calling convention. The interpreter doesn't call machine code methods, instead it calls a trampoline that jumps to the right point in a machine code method, allowing the system to optimize the common case of machine-code to machine-code calls (through inline caches). The return address of a machine-code frame above an interpreter frame is that of a routine that does a return to the interpreter so that returns don't have to check for returning to the interpreter.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> that C call frame is torn down when exiting each<br>
compiled method then re-created when reentering native code. That's<br>
a complete waste when going from one native method to another.</blockquote><div><br></div><div>So you go to all the effort of producing native code and then you wrap it in so much gunk that you get minimal performance benefit from it. I don't understand. What are your goals? Experimenting with compilers or producing an efficient Squeak VM?</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Also the send/return sequence isn't yet that optimised, there's still<br>
plenty of inefficiencies due to lack of addressing modes etc and because<br>
it's fairly naive translation of the interpreters send code.<br>
<br>
17% would be rather optimistic, some of the work required to set up a<br>
context will always be required. Temporaries will still need to be<br>
nilled out etc.</blockquote><div><br></div><div>Again invalid assumptions. Do without contexts (except as a frame access abstraction). Sufficient adaptive optimization can avoid temporary initializations (e.g. by embedding information that records live ranges).</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><br>
<br>
Bryce<br>
</blockquote></div><br>