On Mon, May 23, 2011 at 8:42 PM, David T. Lewis <lewis@mail.msen.com> wrote:
 
On Mon, May 23, 2011 at 07:30:09PM -0400, David T. Lewis wrote:
> On Mon, May 23, 2011 at 02:33:52PM -0700, Eliot Miranda wrote:
> >
> > On Mon, May 23, 2011 at 2:08 PM, David T. Lewis <lewis@mail.msen.com> wrote:
> > >
> > >  Testing success status, original:
> > >        if (foo->successFlag) { ... }
> > >
> > >  Testing success status, new:
> > >        if (foo->primFailCode == 0) { ... }
> > >
> > >  Setting failure status, original:
> > >         foo->successFlag = 0;
> > >
> > >  Setting failure status, new:
> > >        if (foo->primFailCode == 0) {
> > >                foo->primFailCode = 1;
> > >        }
> > >
> > > So in each case the global struct is being used, both for successFlag
> > > and primFailCode. Sorry for the confusion. In any case, I'm still left
> > > scratching my head over the size of the performance difference.
> > >
> >
> > One thought, where are successFlag and primFailCode in the struct?  Perhaps
> > the size of the offset needed to access them makes a difference?
>
> In both cases they are the first element of the struct, so that
> cannot be it.
>
> I think I had better circle back and redo my tests. Maybe I made
> a mistake somewhere.
>

No mistake, the performance problem was real.

Good news - I found the cause. Better news - this may be good for a
performance boost on StackVM and possibly Cog also.

thanks!
 

The performance hit was due almost entirely to InterpreterPrimitives>>failed,
and perhaps a little bit to #successful and #success: also.

This issue with #failed is due to "^primFailCode ~= 0" which, for purposes
of C translation, can be recoded as "^primFailCode" with an override in
the simulator as "^primFailCode ~= 0". This produces a significant speed
improvement, at least as fast as for the original interpreter implementation
using successFlag.

Note that with the Cog code generator and for the purposes of the simulator this can read

failed
<api>
^self cCode: [primFailCode] inSmalltalk: [primFailCode ~= 0]

The Cog inliner maps self cCode: aCBlock inSmalltalk: anStBlock to aCBlock at TMethod creation time, hence avoiding the inability to inline cCode:inSmallalk:.  See MessageNode>>asTranslatorNode: in the Cog VMMaker.  I'll integrate as such in Cog.


I expect that the same change applied to StackInterpreter may give a similar
10% improvement (though I have not tried it). I don't know what to expect
with Cog, but it may give a boost there as well.

Changes attached, also included in VMMaker-dtl.237 on SqueakSource.

Dave