On Mon, May 23, 2011 at 2:08 PM, David T. Lewis <lewis@mail.msen.com> wrote:

On Mon, May 23, 2011 at 01:44:48PM -0700, Eliot Miranda wrote:
>
> Hi David,
>
> the difference looks to me to do with the fact that successFlag is flat
> and primErrorCode is in the VM struct. Try generating a VM where either
> primFailCode is also flat or, better still, all variables are flat. In my
> experience the flat form is faster on x86 (and faster with both the intel
> and gcc compilers; not tested with llvm yet). BTW, if you use the Cog
> generator it'll generate accesses to variables which might be in the VM
> struct as GIV(theVariableInQuestion) (where GIV stands for global
> interpreter variable), and this allows one to choose whether these variables
> are kept in a struct or kept as separate variables at compile-time instead
> of generation time, as controlled by the USE_GLOBAL_STRUCT compile-time
> constant, e.g. gcc -DUSE_GLOBAL_STRUCT=0 gcc3x-interp.c.

Eliot,

Thanks, and I have to apologize because I quoted the code incorrectly
in my original message. The generated code before and after the change
actually looks like this (sorry I forgot the "foo"):

Ah, ok.

Testing success status, original:
if (foo->successFlag) { ... }

Testing success status, new:
if (foo->primFailCode == 0) { ... }

Setting failure status, original:

foo->successFlag = 0;

Setting failure status, new:
if (foo->primFailCode == 0) {
foo->primFailCode = 1;
}

So in each case the global struct is being used, both for successFlag
and primFailCode. Sorry for the confusion. In any case, I'm still left
scratching my head over the size of the performance difference.

One thought, where are successFlag and primFailCode in the struct? Perhaps the size of the offset needed to access them makes a difference?

Dave

>
> On Sun, May 22, 2011 at 8:54 AM, David T. Lewis <lewis@mail.msen.com> wrote:
>
> >
> > I have been trying to gradually update trunk VMMaker to better align
> > with oscog VMMaker (an admittedly slow process, but hopefully still
> > worthwhile). I have gotten the interpreter primitives moved into class
> > InterpreterPrimitives and verified no changes to generated code. This
> > greatly reduces the clutter in class Interpreter, so it's a nice change
> > I think.
> >
> > My next step was to update all of the primitives to use the
> > #primitiveFailFor:
> > idiom, in which the successFlag variable is replaced with primFailCode
> > (integer value, 0 for success, 1, 2, 3... for failure codes). This would
> > get us closer to the point where the standard interpreter and stack/cog
> > would use a common set of primitives. A lot of changes were required for
> > this, but the resulting VM works fine ... except for performance.
> >
> > On a standard interpreter, use of primFailCode seems to result in a
> > nearly 12% reduction in bytecode performance as measured by tinyBenchmarks:
> >
> > Standard interpreter (using successFlag):
> > 0 tinyBenchmarks. '439108061 bytecodes/sec; 15264622 sends/sec'
> > 0 tinyBenchmarks. '433164128 bytecodes/sec; 14740358 sends/sec'
> > 0 tinyBenchmarks. '445993031 bytecodes/sec; 15040691 sends/sec'
> > 0 tinyBenchmarks. '440999138 bytecodes/sec; 15052960 sends/sec'
> > 0 tinyBenchmarks. '445993031 bytecodes/sec; 14485815 sends/sec'
> >
> > After updating the standard interpreter (using primFailCode):
> > 0 tinyBenchmarks. '393241167 bytecodes/sec; 14066256 sends/sec'
> > 0 tinyBenchmarks. '392036753 bytecodes/sec; 15040691 sends/sec'
> > 0 tinyBenchmarks. '393846153 bytecodes/sec; 14272953 sends/sec'
> > 0 tinyBenchmarks. '400625978 bytecodes/sec; 14991818 sends/sec'
> > 0 tinyBenchmarks. '393846153 bytecodes/sec; 15176750 sends/sec'
> >
> > This is a much larger performance difference than I expected to see.
> > Actually I expected no measurable difference at all, and I was just
> > testing to verify this. But 12% is a lot, so I want to ask if I'm
> > missing something?
> >
> > The changes to generated code generally take the form of:
> >
> > Testing success status, original:
> > if (successFlag) { ... }
> >
> > Testing success status, new:
> > if (foo->primFailCode == 0) { ... }
> >
> > Setting failure status, original:
> > successFlag = 0;
> >
> > Setting failure status, new:
> > if (foo->primFailCode == 0) {
> > foo->primFailCode = 1;
> > }
> >
> > My approach to doing the updates was as follows:
> > - Replace all occurrences of "successFlag := true" with "self
> > initPrimCall",
> > which initialize primFailCode to 0.
> > - Replace all "successFlag := false" with "self primitiveFail".
> > - Replace all "successFlag ifTrue: [] ifFalse: []" with
> > "self successful ifTrue: [] ifFalse: []".
> > - Update #primitiveFail, #failed and #success: to use primFailCode rather
> > than successFlag.
> > - Remove successFlag variable.
> >
> > Obviously I don't want to publish the code on SqS/VMMaker, but I can mail
> > an interp.c if anyone wants to see the gory details (It is too large to
> > post on this mailing list though).
> >
> > Any advice appreciated. I suspect I'm missing something basic here.
> >
> > Thanks,
> > Dave
> >
> >