On Mon, May 23, 2011 at 07:30:09PM -0400, David T. Lewis wrote:
On Mon, May 23, 2011 at 02:33:52PM -0700, Eliot Miranda wrote:
On Mon, May 23, 2011 at 2:08 PM, David T. Lewis lewis@mail.msen.com wrote:
Testing success status, original: if (foo->successFlag) { ... }
Testing success status, new: if (foo->primFailCode == 0) { ... }
Setting failure status, original: foo->successFlag = 0;
Setting failure status, new: if (foo->primFailCode == 0) { foo->primFailCode = 1; }
So in each case the global struct is being used, both for successFlag and primFailCode. Sorry for the confusion. In any case, I'm still left scratching my head over the size of the performance difference.
One thought, where are successFlag and primFailCode in the struct? Perhaps the size of the offset needed to access them makes a difference?
In both cases they are the first element of the struct, so that cannot be it.
I think I had better circle back and redo my tests. Maybe I made a mistake somewhere.
No mistake, the performance problem was real.
Good news - I found the cause. Better news - this may be good for a performance boost on StackVM and possibly Cog also.
The performance hit was due almost entirely to InterpreterPrimitives>>failed, and perhaps a little bit to #successful and #success: also.
This issue with #failed is due to "^primFailCode ~= 0" which, for purposes of C translation, can be recoded as "^primFailCode" with an override in the simulator as "^primFailCode ~= 0". This produces a significant speed improvement, at least as fast as for the original interpreter implementation using successFlag.
I expect that the same change applied to StackInterpreter may give a similar 10% improvement (though I have not tried it). I don't know what to expect with Cog, but it may give a boost there as well.
Changes attached, also included in VMMaker-dtl.237 on SqueakSource.
Dave