<br><br><div class="gmail_quote">On Mon, May 23, 2011 at 2:08 PM, David T. Lewis <span dir="ltr">&lt;<a href="mailto:lewis@mail.msen.com">lewis@mail.msen.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="im"><br>

On Mon, May 23, 2011 at 01:44:48PM -0700, Eliot Miranda wrote:<br>

&gt;<br>

&gt; Hi David,<br>

&gt;<br>

&gt;     the difference looks to me to do with the fact that successFlag is flat<br>

&gt; and primErrorCode is in the VM struct.  Try generating a VM where either<br>

&gt; primFailCode is also flat or, better still, all variables are flat.  In my<br>

&gt; experience the flat form is faster on x86 (and faster with both the intel<br>

&gt; and gcc compilers; not tested with llvm yet).  BTW, if you use the Cog<br>

&gt; generator it&#39;ll generate accesses to variables which might be in the VM<br>

&gt; struct as GIV(theVariableInQuestion) (where GIV stands for global<br>

&gt; interpreter variable), and this allows one to choose whether these variables<br>

&gt; are kept in a struct or kept as separate variables at compile-time instead<br>

&gt; of generation time, as controlled by the USE_GLOBAL_STRUCT compile-time<br>

&gt; constant, e.g. gcc -DUSE_GLOBAL_STRUCT=0 gcc3x-interp.c.<br>

<br>

</div>Eliot,<br>

<br>

Thanks, and I have to apologize because I quoted the code incorrectly<br>

in my original message. The generated code before and after the change<br>

actually looks like this (sorry I forgot the &quot;foo&quot;):<br></blockquote><div><br></div><div>Ah, ok.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<br>

  Testing success status, original:<br>

        if (foo-&gt;successFlag) { ... }<br>

<div class="im"><br>

  Testing success status, new:<br>

        if (foo-&gt;primFailCode == 0) { ... }<br>

<br>

  Setting failure status, original:<br>

</div>        foo-&gt;successFlag = 0;<br>

<div class="im"><br>

  Setting failure status, new:<br>

        if (foo-&gt;primFailCode == 0) {<br>

                foo-&gt;primFailCode = 1;<br>

        }<br>

<br>

</div>So in each case the global struct is being used, both for successFlag<br>

and primFailCode. Sorry for the confusion. In any case, I&#39;m still left<br>

scratching my head over the size of the performance difference.<br></blockquote><div><br></div><div><br></div><div>One thought, where are successFlag and primFailCode in the struct?  Perhaps the size of the offset needed to access them makes a difference?</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

Dave<br>

<div><div></div><div class="h5"><br>

&gt;<br>

&gt; On Sun, May 22, 2011 at 8:54 AM, David T. Lewis &lt;<a href="mailto:lewis@mail.msen.com">lewis@mail.msen.com</a>&gt; wrote:<br>

&gt;<br>

&gt; &gt;<br>

&gt; &gt; I have been trying to gradually update trunk VMMaker to better align<br>

&gt; &gt; with oscog VMMaker (an admittedly slow process, but hopefully still<br>

&gt; &gt; worthwhile).  I have gotten the interpreter primitives moved into class<br>

&gt; &gt; InterpreterPrimitives and verified no changes to generated code. This<br>

&gt; &gt; greatly reduces the clutter in class Interpreter, so it&#39;s a nice change<br>

&gt; &gt; I think.<br>

&gt; &gt;<br>

&gt; &gt; My next step was to update all of the primitives to use the<br>

&gt; &gt; #primitiveFailFor:<br>

&gt; &gt; idiom, in which the successFlag variable is replaced with primFailCode<br>

&gt; &gt; (integer value, 0 for success, 1, 2, 3... for failure codes). This would<br>

&gt; &gt; get us closer to the point where the standard interpreter and stack/cog<br>

&gt; &gt; would use a common set of primitives. A lot of changes were required for<br>

&gt; &gt; this, but the resulting VM works fine ... except for performance.<br>

&gt; &gt;<br>

&gt; &gt; On a standard interpreter, use of primFailCode seems to result in a<br>

&gt; &gt; nearly 12% reduction in bytecode performance as measured by tinyBenchmarks:<br>

&gt; &gt;<br>

&gt; &gt; Standard interpreter (using successFlag):<br>

&gt; &gt;  0 tinyBenchmarks. &#39;439108061 bytecodes/sec; 15264622 sends/sec&#39;<br>

&gt; &gt;  0 tinyBenchmarks. &#39;433164128 bytecodes/sec; 14740358 sends/sec&#39;<br>

&gt; &gt;  0 tinyBenchmarks. &#39;445993031 bytecodes/sec; 15040691 sends/sec&#39;<br>

&gt; &gt;  0 tinyBenchmarks. &#39;440999138 bytecodes/sec; 15052960 sends/sec&#39;<br>

&gt; &gt;  0 tinyBenchmarks. &#39;445993031 bytecodes/sec; 14485815 sends/sec&#39;<br>

&gt; &gt;<br>

&gt; &gt; After updating the standard interpreter (using primFailCode):<br>

&gt; &gt;  0 tinyBenchmarks. &#39;393241167 bytecodes/sec; 14066256 sends/sec&#39;<br>

&gt; &gt;  0 tinyBenchmarks. &#39;392036753 bytecodes/sec; 15040691 sends/sec&#39;<br>

&gt; &gt;  0 tinyBenchmarks. &#39;393846153 bytecodes/sec; 14272953 sends/sec&#39;<br>

&gt; &gt;  0 tinyBenchmarks. &#39;400625978 bytecodes/sec; 14991818 sends/sec&#39;<br>

&gt; &gt;  0 tinyBenchmarks. &#39;393846153 bytecodes/sec; 15176750 sends/sec&#39;<br>

&gt; &gt;<br>

&gt; &gt; This is a much larger performance difference than I expected to see.<br>

&gt; &gt; Actually I expected no measurable difference at all, and I was just<br>

&gt; &gt; testing to verify this. But 12% is a lot, so I want to ask if I&#39;m<br>

&gt; &gt; missing something?<br>

&gt; &gt;<br>

&gt; &gt; The changes to generated code generally take the form of:<br>

&gt; &gt;<br>

&gt; &gt; Testing success status, original:<br>

&gt; &gt;        if (successFlag) { ... }<br>

&gt; &gt;<br>

&gt; &gt; Testing success status, new:<br>

&gt; &gt;        if (foo-&gt;primFailCode == 0) { ... }<br>

&gt; &gt;<br>

&gt; &gt; Setting failure status, original:<br>

&gt; &gt;        successFlag = 0;<br>

&gt; &gt;<br>

&gt; &gt; Setting failure status, new:<br>

&gt; &gt;        if (foo-&gt;primFailCode == 0) {<br>

&gt; &gt;                foo-&gt;primFailCode = 1;<br>

&gt; &gt;        }<br>

&gt; &gt;<br>

&gt; &gt; My approach to doing the updates was as follows:<br>

&gt; &gt; - Replace all occurrences of &quot;successFlag := true&quot; with &quot;self<br>

&gt; &gt; initPrimCall&quot;,<br>

&gt; &gt;  which initialize primFailCode to 0.<br>

&gt; &gt; - Replace all &quot;successFlag := false&quot; with &quot;self primitiveFail&quot;.<br>

&gt; &gt; - Replace all &quot;successFlag ifTrue: [] ifFalse: []&quot; with<br>

&gt; &gt;  &quot;self successful ifTrue: [] ifFalse: []&quot;.<br>

&gt; &gt; - Update #primitiveFail, #failed and #success: to use primFailCode rather<br>

&gt; &gt;  than successFlag.<br>

&gt; &gt; - Remove successFlag variable.<br>

&gt; &gt;<br>

&gt; &gt; Obviously I don&#39;t want to publish the code on SqS/VMMaker, but I can mail<br>

&gt; &gt; an interp.c if anyone wants to see the gory details (It is too large to<br>

&gt; &gt; post on this mailing list though).<br>

&gt; &gt;<br>

&gt; &gt; Any advice appreciated. I suspect I&#39;m missing something basic here.<br>

&gt; &gt;<br>

&gt; &gt; Thanks,<br>

&gt; &gt; Dave<br>

&gt; &gt;<br>

&gt; &gt;<br>

<br>

</div></div></blockquote></div><br>