<div dir="ltr">Hi Henrik,<br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 12, 2016 at 1:07 AM, Henrik Johansen <span dir="ltr">&lt;<a href="mailto:henrik.s.johansen@veloxit.no" target="_blank">henrik.s.johansen@veloxit.no</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <br><div style="word-wrap:break-word">Could also be something gone wonky with the interpreters method cache, causing excessive misses and the fallback code (lookupOrdinary... and addNewMethod..) to run.<div>Initially I&#39;d suspect there might&#39;ve been changes that&#39;d affect hashing (such as classTag - index in class table, being used instead of class pointer), but I guess that happened before the first version.</div><div><br></div><div>I would note, in the generated code for spur in the git repo, there&#39;s a case on top of addNewMethodToCache that doesn&#39;t actually add the found method to cache:</div><div><div>if (!(((GIV(newMethod) &amp; (tagMask())) == 0)</div><div><span style="white-space:pre-wrap">        </span> &amp;&amp; (((((usqInt) (longAt(GIV(newMethod)))) &gt;&gt; (formatShift())) &amp; (formatMask())) &gt;= (firstCompiledMethodFormat())))) {</div><div><span style="white-space:pre-wrap">        </span>primitiveFunctionPointer = primitiveInvokeObjectAsMethod;</div><div><span style="white-space:pre-wrap">        </span>return;</div><div>}</div></div><div><br></div><div>I don&#39;t know exactly what firstCompiledMethodFormat() entails, but if it means found method that is a CompiledMethod, that would pretty much mean the MethodCache is entirely unused, no?</div></div></blockquote><div><br></div><div>The expression is ! (not) &quot;is it non-immediate with compiled method format&quot;.  So this expression adds the primitiveInvokeObjectAsMethod as the primitive whenever what the lookup finds is /not/ a compiled method.  This is the implementation of &quot;objects as methods&quot;.  All compiled methods are non-immediates and have a format &gt;= firstCompiledMethodFormat:</div><div><br></div><div>Behavior&gt;&gt;instSpec</div><div><span class="" style="white-space:pre">        </span>&quot;Answer the instance specification part of the format that defines what kind of object</div><div><span class="" style="white-space:pre">        </span> an instance of the receiver is.  The formats are</div><div><span class="" style="white-space:pre">                        </span>0<span class="" style="white-space:pre">        </span>= 0 sized objects (UndefinedObject True False et al)</div><div><span class="" style="white-space:pre">                        </span>1<span class="" style="white-space:pre">        </span>= non-indexable objects with inst vars (Point et al)</div><div><span class="" style="white-space:pre">                        </span>2<span class="" style="white-space:pre">        </span>= indexable objects with no inst vars (Array et al)</div><div><span class="" style="white-space:pre">                        </span>3<span class="" style="white-space:pre">        </span>= indexable objects with inst vars (MethodContext AdditionalMethodState et al)</div><div><span class="" style="white-space:pre">                        </span>4<span class="" style="white-space:pre">        </span>= weak indexable objects with inst vars (WeakArray et al)</div><div><span class="" style="white-space:pre">                        </span>5<span class="" style="white-space:pre">        </span>= weak non-indexable objects with inst vars (ephemerons) (Ephemeron)</div><div><span class="" style="white-space:pre">                        </span>6<span class="" style="white-space:pre">        </span>= unused</div><div><span class="" style="white-space:pre">                        </span>7<span class="" style="white-space:pre">        </span>= immediates (SmallInteger, Character)</div><div><span class="" style="white-space:pre">                        </span>8<span class="" style="white-space:pre">        </span>= unused</div><div><span class="" style="white-space:pre">                        </span>9<span class="" style="white-space:pre">        </span>= 64-bit indexable</div><div><span class="" style="white-space:pre">                </span>10-11<span class="" style="white-space:pre">        </span>= 32-bit indexable (Bitmap)<span class="" style="white-space:pre">                                        </span>(plus one odd bit, unused in 32-bits)</div><div><span class="" style="white-space:pre">                </span>12-15<span class="" style="white-space:pre">        </span>= 16-bit indexable<span class="" style="white-space:pre">                                                        </span>(plus two odd bits, one unused in 32-bits)</div><div><span class="" style="white-space:pre">                </span>16-23<span class="" style="white-space:pre">        </span>= 8-bit indexable<span class="" style="white-space:pre">                                                        </span>(plus three odd bits, one unused in 32-bits)</div><div><span class="" style="white-space:pre">                </span>24-31<span class="" style="white-space:pre">        </span>= compiled methods (CompiledMethod)<span class="" style="white-space:pre">        </span>(plus three odd bits, one unused in 32-bits)</div><div><span class="" style="white-space:pre">        </span> Note that in the VM instances also have a 5 bit format field that relates to their class&#39;s format.</div><div><span class="" style="white-space:pre">        </span> Formats 11, 13-15, 17-23 &amp; 25-31 are unused in classes but used in instances to define the</div><div><span class="" style="white-space:pre">        </span> number of elements missing up to the slot size.  For example, a 2-byte ByteString instance</div><div><span class="" style="white-space:pre">        </span> has format 18 in 32-bits, since its size is one 32-bit slot - 2 bytes ((18 bitAnd: 3) = 2), and</div><div><span class="" style="white-space:pre">        </span> 22 in 64 bits, since its size is one 64-bit slot - 6 bytes ((22 bitAnd: 7) = 6).&quot;</div><div><span class="" style="white-space:pre">        </span>^(format bitShift: -16) bitAnd: 16r1F</div><div><br></div><div>So the format encodes both &quot;it is a compiled method&quot; and the number of unused trailing elements.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>Cheers,</div><div>Henry</div><div><br></div><div><div><blockquote type="cite"><div>On 07 Jul 2016, at 10:04 , Clément Bera &lt;<a href="mailto:bera.clement@gmail.com" target="_blank">bera.clement@gmail.com</a>&gt; wrote:</div><br><div><div dir="ltr">Yes. Number of samples and percentage of overall samples. You&#39;re right it&#39;s not that many. Here is the result when I run 5 times the bench (I usually use the profiler just as a hint to tweak performance, then the in-image results matter).<div><br></div><div>New VM:</div><div><font face="monospace, monospace"><br></font></div><div><div><font face="monospace, monospace">% of vanilla vm code (% of total)<span style="white-space:pre-wrap">                                                                                        </span>(samples) (cumulative)</font></div><div><font face="monospace, monospace">20.72%    (  9.98%)<span style="white-space:pre-wrap">        </span>scavengeReferentsOf<span style="white-space:pre-wrap">                                                                                </span>(2943)<span style="white-space:pre-wrap">        </span>(20.72%)</font></div><div><font face="monospace, monospace"><b><u>10.52%    (  5.07%)<span style="white-space:pre-wrap">        </span>lookupOrdinaryNoMNUEtcInClass<span style="white-space:pre-wrap">                                                                        </span>(1494)<span style="white-space:pre-wrap">        </span>(31.24%)</u></b></font></div><div><font face="monospace, monospace">  6.63%    (  3.19%)<span style="white-space:pre-wrap">        </span>processWeakSurvivor<span style="white-space:pre-wrap">                                                                                </span>(941)<span style="white-space:pre-wrap">        </span>(37.87%)</font></div><div><font face="monospace, monospace">  5.62%    (  2.71%)<span style="white-space:pre-wrap">        </span>copyAndForward<span style="white-space:pre-wrap">                                                                                        </span>(798)<span style="white-space:pre-wrap">        </span>(43.49%)</font></div><div><font face="monospace, monospace"> <b><u> 4.94%    (  2.38%)<span style="white-space:pre-wrap">        </span>addNewMethodToCache<span style="white-space:pre-wrap">                                                                                </span>(702)<span style="white-space:pre-wrap">        </span>(48.43%)</u></b></font></div><div><font face="monospace, monospace">  4.86%    (  2.34%)<span style="white-space:pre-wrap">        </span>doScavenge<span style="white-space:pre-wrap">                                                                                        </span>(690)<span style="white-space:pre-wrap">        </span>(53.29%)</font></div><div><font face="monospace, monospace">  3.20%    (  1.54%)<span style="white-space:pre-wrap">        </span>primitiveStringReplace<span style="white-space:pre-wrap">                                                                                </span>(455)<span style="white-space:pre-wrap">        </span>(56.50%)</font></div><div><font face="monospace, monospace">  3.06%    (  1.48%)<span style="white-space:pre-wrap">        </span>moveFramesInthroughtoPage<span style="white-space:pre-wrap">                                                                        </span>(435)<span style="white-space:pre-wrap">        </span>(59.56%)</font></div><div><font face="monospace, monospace">  2.94%    (  1.42%)<span style="white-space:pre-wrap">        </span>compact<span style="white-space:pre-wrap">                                                                                                </span>(418)<span style="white-space:pre-wrap">        </span>(62.50%)</font></div><div><font face="monospace, monospace"> <b><u> 2.87%    (  1.38%)<span style="white-space:pre-wrap">        </span>interpret<span style="white-space:pre-wrap">                                                                                        </span>(407)<span style="white-space:pre-wrap">        </span>(65.37%)</u></b></font></div><div><font face="monospace, monospace"> <b><u> 2.52%    (  1.21%)<span style="white-space:pre-wrap">        </span>ceSendFromInLineCacheMiss<span style="white-space:pre-wrap">                                                                        </span>(358)<span style="white-space:pre-wrap">        </span>(67.89%)</u></b></font></div></div><div><br></div><div>Old VM:</div><div><br></div><div><div><font face="monospace, monospace">% of vanilla vm code (% of total)<span style="white-space:pre-wrap">                                                </span>(samples) (cumulative)</font></div><div><font face="monospace, monospace">27.19%    (10.97%)<span style="white-space:pre-wrap">        </span>scavengeReferentsOf<span style="white-space:pre-wrap">                                        </span>(2615)<span style="white-space:pre-wrap">        </span>(27.19%)</font></div><div><font face="monospace, monospace">  9.80%    (  3.95%)<span style="white-space:pre-wrap">        </span>processWeakSurvivor<span style="white-space:pre-wrap">                                        </span>(943)<span style="white-space:pre-wrap">        </span>(36.99%)</font></div><div><font face="monospace, monospace">  8.76%    (  3.54%)<span style="white-space:pre-wrap">        </span>copyAndForward<span style="white-space:pre-wrap">                                                </span>(843)<span style="white-space:pre-wrap">        </span>(45.75%)</font></div><div><font face="monospace, monospace">  7.06%    (  2.85%)<span style="white-space:pre-wrap">        </span>doScavenge<span style="white-space:pre-wrap">                                                </span>(679)<span style="white-space:pre-wrap">        </span>(52.81%)</font></div><div><font face="monospace, monospace">  4.81%    (  1.94%)<span style="white-space:pre-wrap">        </span>primitiveStringReplace<span style="white-space:pre-wrap">                                        </span>(463)<span style="white-space:pre-wrap">        </span>(57.63%)</font></div><div><font face="monospace, monospace">  4.28%    (  1.73%)<span style="white-space:pre-wrap">        </span>moveFramesInthroughtoPage<span style="white-space:pre-wrap">                                </span>(412)<span style="white-space:pre-wrap">        </span>(61.91%)</font></div><div><font face="monospace, monospace">  <b><u>4.15%    (  1.67%)<span style="white-space:pre-wrap">        </span>lookupOrdinaryNoMNUEtcInClass<span style="white-space:pre-wrap">                                </span>(399)<span style="white-space:pre-wrap">        </span>(66.06%)</u></b></font></div><div><font face="monospace, monospace">  2.44%    (  0.99%)<span style="white-space:pre-wrap">        </span>marryFrameSP<span style="white-space:pre-wrap">                                                </span>(235)<span style="white-space:pre-wrap">        </span>(68.50%)</font></div><div><font face="monospace, monospace">  2.30%    (  0.93%)<span style="white-space:pre-wrap">        </span>isWidowedContext<span style="white-space:pre-wrap">                                        </span>(221)<span style="white-space:pre-wrap">        </span>(70.80%)</font></div><div><font face="monospace, monospace">  2.19%    (  0.88%)<span style="white-space:pre-wrap">        </span>handleStackOverflow<span style="white-space:pre-wrap">                                        </span>(211)<span style="white-space:pre-wrap">        </span>(72.99%)</font></div><div><font face="monospace, monospace">  1.64%    (  0.66%)<span style="white-space:pre-wrap">        </span>primitiveCompareString<span style="white-space:pre-wrap">                                        </span>(158)<span style="white-space:pre-wrap">        </span>(74.63%)</font></div><div><font face="monospace, monospace">  1.50%    (  0.60%)<span style="white-space:pre-wrap">        </span>findMethodWithPrimitiveFromContextUpToContext<span style="white-space:pre-wrap">        </span></font><span style="font-family:monospace,monospace;white-space:pre-wrap">        </span><span style="font-family:monospace,monospace">(144)</span><span style="font-family:monospace,monospace;white-space:pre-wrap">        </span><span style="font-family:monospace,monospace">(76.13%)</span></div><div><font face="monospace, monospace">  1.50%    (  0.60%)<span style="white-space:pre-wrap">        </span>ceBaseFrameReturn<span style="white-space:pre-wrap">                                        </span>(144)<span style="white-space:pre-wrap">        </span>(77.63%)</font></div><div><font face="monospace, monospace">  1.45%    (  0.58%)<span style="white-space:pre-wrap">        </span>ceStackOverflow<span style="white-space:pre-wrap">                                                </span>(139)<span style="white-space:pre-wrap">        </span>(79.07%)</font></div><div><font face="monospace, monospace">  1.16%    (  0.47%)<span style="white-space:pre-wrap">        </span>ceNonLocalReturn<span style="white-space:pre-wrap">                                        </span>(112)<span style="white-space:pre-wrap">        </span>(80.24%)</font></div><div><font face="monospace, monospace">  1.11%    (  0.45%)<span style="white-space:pre-wrap">        </span>allocateNewSpaceSlotsformatclassIndex<span style="white-space:pre-wrap">                        </span>(107)<span style="white-space:pre-wrap">        </span>(81.35%)</font></div><div><font face="monospace, monospace">  1.00%    (  0.40%)<span style="white-space:pre-wrap">        </span>isBytes<span style="white-space:pre-wrap">                                                        </span>(96)<span style="white-space:pre-wrap">                </span>(82.35%)</font></div><div><font face="monospace, monospace">  0.95%    (  0.38%)<span style="white-space:pre-wrap">        </span>fetchClassOfNonImm<span style="white-space:pre-wrap">                                        </span>(91)<span style="white-space:pre-wrap">                </span>(83.29%)</font></div><div><font face="monospace, monospace">  0.93%    (  0.37%)<span style="white-space:pre-wrap">        </span>stackValue<span style="white-space:pre-wrap">                                                </span>(89)<span style="white-space:pre-wrap">                </span>(84.22%)</font></div><div><font face="monospace, monospace">  0.90%    (  0.36%)<span style="white-space:pre-wrap">        </span>returnToExecutivepostContextSwitch<span style="white-space:pre-wrap">                        </span>(87)<span style="white-space:pre-wrap">                </span>(85.12%)</font></div><div><font face="monospace, monospace"> <b><u> 0.90%    (  0.36%)<span style="white-space:pre-wrap">        </span>interpret<span style="white-space:pre-wrap">                                                </span>(87)<span style="white-space:pre-wrap">                </span>(86.03%)</u></b></font></div><div><font face="monospace, monospace"> <b><u> 0.84%    (  0.34%)<span style="white-space:pre-wrap">        </span>addNewMethodToCache<span style="white-space:pre-wrap">                                        </span>(81)<span style="white-space:pre-wrap">                </span>(86.87%)</u></b></font></div><div><font face="monospace, monospace"><b><u>  0.83%    (  0.34%)<span style="white-space:pre-wrap">        </span>ceSendFromInLineCacheMiss<span style="white-space:pre-wrap">                                </span>(80)<span style="white-space:pre-wrap">                </span>(87.70%)</u></b></font></div></div><div><font face="monospace, monospace"><b><u><br></u></b></font></div><div><font face="arial, helvetica, sans-serif">Now if I look at the 4 methods highlighted, instead of the 3 I looked at before, I see ~7% overhead. I&#39;m not sure if compact got really slower or if it used to be inlined somewhere during slang compilation. I still think we should use these results as a hit and rely on the real benchmark for performance evaluation.</font></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">If it&#39;s really a slang inlining problem, it will have side-effect on other functions, many will get a little bit faster, and it really seems this is the problem. </font></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif"><br></font></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jul 7, 2016 at 8:50 PM, Holger Freyther <span dir="ltr">&lt;<a href="mailto:holger@freyther.de" target="_blank">holger@freyther.de</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>
&gt; On 07 Jul 2016, at 19:41, Clément Bera &lt;<a href="mailto:bera.clement@gmail.com" target="_blank">bera.clement@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; Hi Holger,<br>
<br>
<br>
Hi!<br>
<span><br>
<br>
&gt; I&#39;m sorry for the delay since you reported the bug. Everyone working on the VM is very busy (In my case, my time is consumed by my phd and sista, I&#39;m trying to open an alpha of the image+runtime before ESUG running at ~1.5x).<br>
<br>
</span>great.<br>
<span><br>
<br>
&gt;<br>
&gt; Now that the VM profiler is fully working I was able to look into the regression.<br>
&gt;<br>
&gt; Old VM vanilla code:<br>
&gt;<br>
&gt; 27.94%    (11.54%)    scavengeReferentsOf                                                             (528)   (27.94%)<br>
&gt; 10.21%    (  4.22%)   processWeakSurvivor                                                             (193)   (38.15%)<br>
&gt;   8.15%    (  3.36%)  copyAndForward                                                                  (154)   (46.30%)<br>
&gt;   7.41%    (  3.06%)  doScavenge                                                                              (140)   (53.70%)<br>
&gt;   5.19%    (  2.14%)  lookupOrdinaryNoMNUEtcInClass                                   (98)            (58.89%)<br>
&gt;   4.71%    (  1.94%)  primitiveStringReplace                                                          (89)            (63.60%)<br>
&gt;   4.29%    (  1.77%)  moveFramesInthroughtoPage                                               (81)            (67.88%)<br>
&gt;   2.06%    (  0.85%)  isWidowedContext                                                                        (39)            (69.95%)<br>
&gt; [...]<br>
&gt;   0.90%    (  0.37%)  ceSendFromInLineCacheMiss                                               (17)            (85.61%)<br>
&gt;   0.85%    (  0.35%)  addNewMethodToCache                                                     (16)            (86.46%)<br>
&gt;<br>
&gt; New VM vanilla code:<br>
&gt;<br>
&gt; % of vanilla vm code (% of total)                                                                                     (samples) (cumulative)<br>
&gt; 22.41%    (10.44%)    scavengeReferentsOf                                                             (609)   (22.41%)<br>
&gt; 14.46%    (  6.74%)   lookupOrdinaryNoMNUEtcInClass                                   (393)   (36.87%)<br>
&gt;<br>
<br>
<br>
</span><span>&gt; I highlighted the problematic methods. It is likely that it has to do with slang inlining.<br>
&gt;<br>
&gt; The 3 methods highlighted seems to be responsible for an overhead of ~8.5% in the overall runtime. You seem to have an overhead of ~14% on my machine. There&#39;s quite a difference. The slang inlining overhead may have impacted other functions, and C profilers are never 100% accurate, so we will see when the problem is fixed if something else is also problematic. Now that we know what the problem is, I believe it will be fixed in the incoming weeks when someone (likely Eliot, maybe me) has time. Thanks for reporting the regression.<br>
<br>
</span>What does (393) and (98) mean? Is it number of samples?<br>
<span><font color="#888888"><br>
holger<br>
<br>
<br>
<br>
<br>
</font></span></blockquote></div><br></div>
</div></blockquote></div><br></div></div><br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><span style="font-size:small;border-collapse:separate"><div>_,,,^..^,,,_<br></div><div>best, Eliot</div></span></div></div></div>
</div></div>