<br><br><div class="gmail_quote">On Mon, May 23, 2011 at 2:08 PM, David T. Lewis <span dir="ltr"><<a href="mailto:lewis@mail.msen.com">lewis@mail.msen.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im"><br>
On Mon, May 23, 2011 at 01:44:48PM -0700, Eliot Miranda wrote:<br>
><br>
> Hi David,<br>
><br>
> the difference looks to me to do with the fact that successFlag is flat<br>
> and primErrorCode is in the VM struct. Try generating a VM where either<br>
> primFailCode is also flat or, better still, all variables are flat. In my<br>
> experience the flat form is faster on x86 (and faster with both the intel<br>
> and gcc compilers; not tested with llvm yet). BTW, if you use the Cog<br>
> generator it'll generate accesses to variables which might be in the VM<br>
> struct as GIV(theVariableInQuestion) (where GIV stands for global<br>
> interpreter variable), and this allows one to choose whether these variables<br>
> are kept in a struct or kept as separate variables at compile-time instead<br>
> of generation time, as controlled by the USE_GLOBAL_STRUCT compile-time<br>
> constant, e.g. gcc -DUSE_GLOBAL_STRUCT=0 gcc3x-interp.c.<br>
<br>
</div>Eliot,<br>
<br>
Thanks, and I have to apologize because I quoted the code incorrectly<br>
in my original message. The generated code before and after the change<br>
actually looks like this (sorry I forgot the "foo"):<br></blockquote><div><br></div><div>Ah, ok.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Testing success status, original:<br>
if (foo->successFlag) { ... }<br>
<div class="im"><br>
Testing success status, new:<br>
if (foo->primFailCode == 0) { ... }<br>
<br>
Setting failure status, original:<br>
</div> foo->successFlag = 0;<br>
<div class="im"><br>
Setting failure status, new:<br>
if (foo->primFailCode == 0) {<br>
foo->primFailCode = 1;<br>
}<br>
<br>
</div>So in each case the global struct is being used, both for successFlag<br>
and primFailCode. Sorry for the confusion. In any case, I'm still left<br>
scratching my head over the size of the performance difference.<br></blockquote><div><br></div><div><br></div><div>One thought, where are successFlag and primFailCode in the struct? Perhaps the size of the offset needed to access them makes a difference?</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Dave<br>
<div><div></div><div class="h5"><br>
><br>
> On Sun, May 22, 2011 at 8:54 AM, David T. Lewis <<a href="mailto:lewis@mail.msen.com">lewis@mail.msen.com</a>> wrote:<br>
><br>
> ><br>
> > I have been trying to gradually update trunk VMMaker to better align<br>
> > with oscog VMMaker (an admittedly slow process, but hopefully still<br>
> > worthwhile). I have gotten the interpreter primitives moved into class<br>
> > InterpreterPrimitives and verified no changes to generated code. This<br>
> > greatly reduces the clutter in class Interpreter, so it's a nice change<br>
> > I think.<br>
> ><br>
> > My next step was to update all of the primitives to use the<br>
> > #primitiveFailFor:<br>
> > idiom, in which the successFlag variable is replaced with primFailCode<br>
> > (integer value, 0 for success, 1, 2, 3... for failure codes). This would<br>
> > get us closer to the point where the standard interpreter and stack/cog<br>
> > would use a common set of primitives. A lot of changes were required for<br>
> > this, but the resulting VM works fine ... except for performance.<br>
> ><br>
> > On a standard interpreter, use of primFailCode seems to result in a<br>
> > nearly 12% reduction in bytecode performance as measured by tinyBenchmarks:<br>
> ><br>
> > Standard interpreter (using successFlag):<br>
> > 0 tinyBenchmarks. '439108061 bytecodes/sec; 15264622 sends/sec'<br>
> > 0 tinyBenchmarks. '433164128 bytecodes/sec; 14740358 sends/sec'<br>
> > 0 tinyBenchmarks. '445993031 bytecodes/sec; 15040691 sends/sec'<br>
> > 0 tinyBenchmarks. '440999138 bytecodes/sec; 15052960 sends/sec'<br>
> > 0 tinyBenchmarks. '445993031 bytecodes/sec; 14485815 sends/sec'<br>
> ><br>
> > After updating the standard interpreter (using primFailCode):<br>
> > 0 tinyBenchmarks. '393241167 bytecodes/sec; 14066256 sends/sec'<br>
> > 0 tinyBenchmarks. '392036753 bytecodes/sec; 15040691 sends/sec'<br>
> > 0 tinyBenchmarks. '393846153 bytecodes/sec; 14272953 sends/sec'<br>
> > 0 tinyBenchmarks. '400625978 bytecodes/sec; 14991818 sends/sec'<br>
> > 0 tinyBenchmarks. '393846153 bytecodes/sec; 15176750 sends/sec'<br>
> ><br>
> > This is a much larger performance difference than I expected to see.<br>
> > Actually I expected no measurable difference at all, and I was just<br>
> > testing to verify this. But 12% is a lot, so I want to ask if I'm<br>
> > missing something?<br>
> ><br>
> > The changes to generated code generally take the form of:<br>
> ><br>
> > Testing success status, original:<br>
> > if (successFlag) { ... }<br>
> ><br>
> > Testing success status, new:<br>
> > if (foo->primFailCode == 0) { ... }<br>
> ><br>
> > Setting failure status, original:<br>
> > successFlag = 0;<br>
> ><br>
> > Setting failure status, new:<br>
> > if (foo->primFailCode == 0) {<br>
> > foo->primFailCode = 1;<br>
> > }<br>
> ><br>
> > My approach to doing the updates was as follows:<br>
> > - Replace all occurrences of "successFlag := true" with "self<br>
> > initPrimCall",<br>
> > which initialize primFailCode to 0.<br>
> > - Replace all "successFlag := false" with "self primitiveFail".<br>
> > - Replace all "successFlag ifTrue: [] ifFalse: []" with<br>
> > "self successful ifTrue: [] ifFalse: []".<br>
> > - Update #primitiveFail, #failed and #success: to use primFailCode rather<br>
> > than successFlag.<br>
> > - Remove successFlag variable.<br>
> ><br>
> > Obviously I don't want to publish the code on SqS/VMMaker, but I can mail<br>
> > an interp.c if anyone wants to see the gory details (It is too large to<br>
> > post on this mailing list though).<br>
> ><br>
> > Any advice appreciated. I suspect I'm missing something basic here.<br>
> ><br>
> > Thanks,<br>
> > Dave<br>
> ><br>
> ><br>
<br>
</div></div></blockquote></div><br>