[Vm-dev] Performance of primitiveFailFor: and use of primFailCode

Eliot Miranda eliot.miranda at gmail.com
Mon May 23 21:33:52 UTC 2011


On Mon, May 23, 2011 at 2:08 PM, David T. Lewis <lewis at mail.msen.com> wrote:

>
> On Mon, May 23, 2011 at 01:44:48PM -0700, Eliot Miranda wrote:
> >
> > Hi David,
> >
> >     the difference looks to me to do with the fact that successFlag is
> flat
> > and primErrorCode is in the VM struct.  Try generating a VM where either
> > primFailCode is also flat or, better still, all variables are flat.  In
> my
> > experience the flat form is faster on x86 (and faster with both the intel
> > and gcc compilers; not tested with llvm yet).  BTW, if you use the Cog
> > generator it'll generate accesses to variables which might be in the VM
> > struct as GIV(theVariableInQuestion) (where GIV stands for global
> > interpreter variable), and this allows one to choose whether these
> variables
> > are kept in a struct or kept as separate variables at compile-time
> instead
> > of generation time, as controlled by the USE_GLOBAL_STRUCT compile-time
> > constant, e.g. gcc -DUSE_GLOBAL_STRUCT=0 gcc3x-interp.c.
>
> Eliot,
>
> Thanks, and I have to apologize because I quoted the code incorrectly
> in my original message. The generated code before and after the change
> actually looks like this (sorry I forgot the "foo"):
>

Ah, ok.


>
>  Testing success status, original:
>        if (foo->successFlag) { ... }
>
>  Testing success status, new:
>        if (foo->primFailCode == 0) { ... }
>
>  Setting failure status, original:
>         foo->successFlag = 0;
>
>  Setting failure status, new:
>        if (foo->primFailCode == 0) {
>                foo->primFailCode = 1;
>        }
>
> So in each case the global struct is being used, both for successFlag
> and primFailCode. Sorry for the confusion. In any case, I'm still left
> scratching my head over the size of the performance difference.
>


One thought, where are successFlag and primFailCode in the struct?  Perhaps
the size of the offset needed to access them makes a difference?


> Dave
>
> >
> > On Sun, May 22, 2011 at 8:54 AM, David T. Lewis <lewis at mail.msen.com>
> wrote:
> >
> > >
> > > I have been trying to gradually update trunk VMMaker to better align
> > > with oscog VMMaker (an admittedly slow process, but hopefully still
> > > worthwhile).  I have gotten the interpreter primitives moved into class
> > > InterpreterPrimitives and verified no changes to generated code. This
> > > greatly reduces the clutter in class Interpreter, so it's a nice change
> > > I think.
> > >
> > > My next step was to update all of the primitives to use the
> > > #primitiveFailFor:
> > > idiom, in which the successFlag variable is replaced with primFailCode
> > > (integer value, 0 for success, 1, 2, 3... for failure codes). This
> would
> > > get us closer to the point where the standard interpreter and stack/cog
> > > would use a common set of primitives. A lot of changes were required
> for
> > > this, but the resulting VM works fine ... except for performance.
> > >
> > > On a standard interpreter, use of primFailCode seems to result in a
> > > nearly 12% reduction in bytecode performance as measured by
> tinyBenchmarks:
> > >
> > > Standard interpreter (using successFlag):
> > >  0 tinyBenchmarks. '439108061 bytecodes/sec; 15264622 sends/sec'
> > >  0 tinyBenchmarks. '433164128 bytecodes/sec; 14740358 sends/sec'
> > >  0 tinyBenchmarks. '445993031 bytecodes/sec; 15040691 sends/sec'
> > >  0 tinyBenchmarks. '440999138 bytecodes/sec; 15052960 sends/sec'
> > >  0 tinyBenchmarks. '445993031 bytecodes/sec; 14485815 sends/sec'
> > >
> > > After updating the standard interpreter (using primFailCode):
> > >  0 tinyBenchmarks. '393241167 bytecodes/sec; 14066256 sends/sec'
> > >  0 tinyBenchmarks. '392036753 bytecodes/sec; 15040691 sends/sec'
> > >  0 tinyBenchmarks. '393846153 bytecodes/sec; 14272953 sends/sec'
> > >  0 tinyBenchmarks. '400625978 bytecodes/sec; 14991818 sends/sec'
> > >  0 tinyBenchmarks. '393846153 bytecodes/sec; 15176750 sends/sec'
> > >
> > > This is a much larger performance difference than I expected to see.
> > > Actually I expected no measurable difference at all, and I was just
> > > testing to verify this. But 12% is a lot, so I want to ask if I'm
> > > missing something?
> > >
> > > The changes to generated code generally take the form of:
> > >
> > > Testing success status, original:
> > >        if (successFlag) { ... }
> > >
> > > Testing success status, new:
> > >        if (foo->primFailCode == 0) { ... }
> > >
> > > Setting failure status, original:
> > >        successFlag = 0;
> > >
> > > Setting failure status, new:
> > >        if (foo->primFailCode == 0) {
> > >                foo->primFailCode = 1;
> > >        }
> > >
> > > My approach to doing the updates was as follows:
> > > - Replace all occurrences of "successFlag := true" with "self
> > > initPrimCall",
> > >  which initialize primFailCode to 0.
> > > - Replace all "successFlag := false" with "self primitiveFail".
> > > - Replace all "successFlag ifTrue: [] ifFalse: []" with
> > >  "self successful ifTrue: [] ifFalse: []".
> > > - Update #primitiveFail, #failed and #success: to use primFailCode
> rather
> > >  than successFlag.
> > > - Remove successFlag variable.
> > >
> > > Obviously I don't want to publish the code on SqS/VMMaker, but I can
> mail
> > > an interp.c if anyone wants to see the gory details (It is too large to
> > > post on this mailing list though).
> > >
> > > Any advice appreciated. I suspect I'm missing something basic here.
> > >
> > > Thanks,
> > > Dave
> > >
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20110523/f7eef8a5/attachment.htm


More information about the Vm-dev mailing list