<div dir="ltr"><div><div><div>Bingo,<br></div>here are the results without isKindOf checks:<br><br>[ 7432154326465436 digitCompare: 8432154326465436 ] bench.<br> &#39;16,100,000 per second. 62.1 nanoseconds per run.&#39;<br><br></div>that is more or less the speed of ByteArray comparison.<br><br></div>I think I will make a new pass on LargeIntegersPlugin :)<br></div><div class="gmail_extra"><br><div class="gmail_quote">2016-04-16 22:54 GMT+02:00 Nicolas Cellier <span dir="ltr">&lt;<a href="mailto:nicolas.cellier.aka.nice@gmail.com" target="_blank">nicolas.cellier.aka.nice@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Ah but maybe the &quot;Smart&quot; side of the plugin is causing trouble:<br><br>    firstInteger := self<br>                primitive: &#39;primDigitCompare&#39;<br>                parameters: #(#Integer )<br>                receiver: #Integer.<br><br>translates as:<br><br>        success(isKindOf(stackValue(0), &quot;Integer&quot;));<br>        secondInteger = stackValue(0);<br>        /* missing DebugCode */;<br>        success(isKindOf(stackValue(1), &quot;Integer&quot;));<br>        firstInteger = stackValue(1);<br><br></div><div>It might be faster to just check the 3 cases:<br><br>(interpreterProxy isIntegerObject: oop) or:<br>    [oopClass := interpreterProxy fetchClassOf: oop.<br>     oopClass == interpreterProxy classLargeNegativeInteger or: [oopClass == interpreterProxy classLargePositiveInteger]].<br><br></div><div>Moreover, we already test isIntegerObject further in the primitive code.<br></div><div>Since every LargeIntegersPlugin primitive is going thru this isKindOf check, there might be some low hanging fruits.<br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">2016-04-16 19:37 GMT+02:00 Nicolas Cellier <span dir="ltr">&lt;<a href="mailto:nicolas.cellier.aka.nice@gmail.com" target="_blank">nicolas.cellier.aka.nice@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><div>Hi,<br></div>the primitive does nothing very special<br></div>1) check for SmallInteger cases first, for quick return<br></div>2) check for LargeIntegers length then if both receiver &amp; argument are Large<br></div>3) check for LargeIntegers digits then if both have same length<br><br></div><div>None of these 3 steps is expected to be slow.<br><br></div>A bleeding age VM does the 3rd step using 32 bits limbs instead of 8bits but this does not change a thing about performances ratio, it could only make a difference for giant integers, these ones fit on 56 bits...<br>I got these with 32bits Cog Sput MacOSX :<br><br>[ 7432154326465436 digitCompare: 8432154326465436 ] bench.<br> &#39;8,980,000 per second. 111 nanoseconds per run.&#39;<span><br><br>| order |<br>order := (0 to: 255) as: ByteArray.<br>[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated: order) - 2 ] bench.<br></span> &#39;16,400,000 per second. 60.8 nanoseconds per run.&#39;<br><br></div>Obviously, most time is spent elsewhere, it would be interesting to know where exactly.<span><font color="#888888"><br><br></font></span></div><span><font color="#888888">Nicolas</font></span><div><div><br><div><div><div><div><div><div><div><div><div class="gmail_extra"><br><div class="gmail_quote">2016-04-16 18:52 GMT+02:00 David T. Lewis <span dir="ltr">&lt;<a href="mailto:lewis@mail.msen.com" target="_blank">lewis@mail.msen.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

It does seem odd that it would be different on Spur. You suggested that<br>

compiler optimization might be set differently for the two plugins. I<br>

cannot check (I&#39;m on an Ubuntu laptop, cannot build Spur due to some<br>

kind of autotools problem that I can&#39;t figure out right now), but if you<br>

are able to do a VM compile and save the console output to a file, then<br>

I think you can look at the CFLAGS setting for the two plugins ($ nohup ./mvm,<br>

but first comment out the part of mvm that asks if you want to clean).<br>

<br>

Aside from that, the actual generated code for primDigitCompare is<br>

doing quite a lot of stuff before it ever gets around to comparing the<br>

digits. I suspect it is not a very efficient primitive.<br>

<br>

Dave<br>

<div><div><br>

<br>

On Fri, Apr 15, 2016 at 05:52:02PM +0200, Levente Uzonyi wrote:<br>

&gt;<br>

&gt; This thread got derailed, and I feel like we should go back to its<br>

&gt; main point, which was that #digitCompare: using #primitiveDigitCompare is<br>

&gt; slow.<br>

&gt; It&#39;s so slow, that #primitiveCompareString is about twice as quick in<br>

&gt; Spur VMs, which is odd.<br>

&gt;<br>

&gt; Levente<br>

&gt;<br>

&gt; On Thu, 14 Apr 2016, Levente Uzonyi wrote:<br>

&gt;<br>

&gt; &gt;Hi Dave,<br>

&gt; &gt;<br>

&gt; &gt;I dag a bit deeper into this problem, and I found that it&#39;s been around<br>

&gt; &gt;for ages. I compared the two primitives in three older images.<br>

&gt; &gt;The first one is Squeak 4.2 running on Cog r2714:<br>

&gt; &gt;<br>

&gt; &gt;[ 7432154326465436 digitCompare: 8432154326465436 ] bench.<br>

&gt; &gt;&#39;6,880,000 per second.&#39;<br>

&gt; &gt;<br>

&gt; &gt;| order |<br>

&gt; &gt;order := (0 to: 255) as: ByteArray.<br>

&gt; &gt;[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:<br>

&gt; &gt;order) - 2 ] bench.<br>

&gt; &gt;&#39;11,200,000 per second.&#39;<br>

&gt; &gt;<br>

&gt; &gt;The next one was an even older Cog VM running an image updated from<br>

&gt; &gt;3.10.2. The VM didn&#39;t have the primitives required to fetch the VM<br>

&gt; &gt;information. The result were &#39;8.22459348130374e6 per second.&#39; and<br>

&gt; &gt;&#39;1.19677724910036e7 per second.&#39;, respectively.<br>

&gt; &gt;<br>

&gt; &gt;The last image was a closure-enabled 3.10.2 running on the classic<br>

&gt; &gt;Interpreter VM. The results were &#39;3.911442911417717e6 per second.&#39; and<br>

&gt; &gt;&#39;4.84567866426715e6 per second.&#39;, respectively.<br>

&gt; &gt;<br>

&gt; &gt;Since in all VMs the seemingly more complex code (primitiveCompareString<br>

&gt; &gt;with a subtraction) was quicker than the simpler code<br>

&gt; &gt;(primitiveDigitCompare), I suspect that LargeIntegersPlugin is compiled<br>

&gt; &gt;with less aggressive optimization than MiscPrimitivePlugin.<br>

&gt; &gt;<br>

&gt; &gt;What&#39;s also interesting is that, based on these benchmarks, the<br>

&gt; &gt;performance got worse over the years. I think invoking a primitive has its<br>

&gt; &gt;cost in newer VMs.<br>

&gt; &gt;<br>

&gt; &gt;Levente<br>

&gt; &gt;<br>

&gt; &gt;On Thu, 14 Apr 2016, David T. Lewis wrote:<br>

&gt; &gt;<br>

&gt; &gt;&gt;Oops, I just realized that my &quot;interpreter VM&quot; results were from a<br>

&gt; &gt;&gt;debugging<br>

&gt; &gt;&gt;VM with compiler optimization off (too many VMs, sorry). Here are the<br>

&gt; &gt;&gt;results<br>

&gt; &gt;&gt;for a more representative interpreter VM. I can&#39;t explain the variation,<br>

&gt; &gt;&gt;but<br>

&gt; &gt;&gt;one conclusion is clear: my suggested &quot;optimization&quot; in the inbox is a<br>

&gt; &gt;&gt;bad idea<br>

&gt; &gt;&gt;(and I will move it to the treated inbox).<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; &quot;Base performance in the trunk level V3 image with interpreter VM&quot;<br>

&gt; &gt;&gt; Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. &quot;==&gt; 26669&quot;<br>

&gt; &gt;&gt; Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. &quot;==&gt; 25052&quot;<br>

&gt; &gt;&gt; Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. &quot;==&gt; 25275&quot;<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt; &quot;Performance after adding LargePositiveInteger&gt;&gt;=&quot;<br>

&gt; &gt;&gt; Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. &quot;==&gt; 59224&quot;<br>

&gt; &gt;&gt; Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]].  &quot;==&gt; 27824&quot;<br>

&gt; &gt;&gt; Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. &quot;==&gt; 44324&quot;<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt;Dave<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt;On Wed, Apr 13, 2016 at 08:25:33PM -0400, David T. Lewis wrote:<br>

&gt; &gt;&gt;&gt;Hi Levente,<br>

&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;I think you may be right. I repeated my test with an interpreter VM on a<br>

&gt; &gt;&gt;&gt;32-bit image (with loop count smaller because the interpreter VM is<br>

&gt; &gt;&gt;&gt;slower<br>

&gt; &gt;&gt;&gt;than Spur). The change that I put in the inbox does not have any benefit<br>

&gt; &gt;&gt;&gt;for the interpreter VM:<br>

&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;  a := 7432154326465436. &quot;a big number&quot;<br>

&gt; &gt;&gt;&gt;  b := a + 1. &quot;low order digit changed&quot;<br>

&gt; &gt;&gt;&gt;  c := 8432154326465436. &quot;high order digit changed&quot;<br>

&gt; &gt;&gt;&gt;  d := 7432154026465436. &quot;a digit in the middle changed&quot;<br>

&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;   &quot;Base performance in the trunk level V3 image with interpreter VM&quot;<br>

&gt; &gt;&gt;&gt;  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. &quot;==&gt; 3844&quot;<br>

&gt; &gt;&gt;&gt;  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. &quot;==&gt; 3786&quot;<br>

&gt; &gt;&gt;&gt;  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]].. &quot;==&gt; 3800&quot;<br>

&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;  &quot;Performance after adding LargePositiveInteger&gt;&gt;=&quot;<br>

&gt; &gt;&gt;&gt;  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. &quot;==&gt; 3868&quot;<br>

&gt; &gt;&gt;&gt;  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. &quot;==&gt; 3775&quot;<br>

&gt; &gt;&gt;&gt;  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]]. &quot;==&gt; 3770&quot;<br>

&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;So yes it is something related to the VM. But I do not understand<br>

&gt; &gt;&gt;&gt;how #primDigitCompare could be so slow? As you say, maybe something<br>

&gt; &gt;&gt;&gt;is wrong with it.<br>

&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;Thank you, I am glad that I asked for a review :-)<br>

&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;Dave<br>

&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;On Thu, Apr 14, 2016 at 01:45:42AM +0200, Levente Uzonyi wrote:<br>

&gt; &gt;&gt;&gt;&gt;Hi Dave,<br>

&gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;I guess this is a VM related issue. On 64-bit Spur<br>

&gt; &gt;&gt;&gt;&gt;[ 7432154326465436 digitCompare: 8432154326465436 ] bench.<br>

&gt; &gt;&gt;&gt;&gt;returns &#39;5,420,000 per second. 184 nanoseconds per run.&#39;.<br>

&gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;Removing the primitive call from #digitCompare: the number goes up:<br>

&gt; &gt;&gt;&gt;&gt;&#39;20,200,000 per second. 49.4 nanoseconds per run.&#39;.<br>

&gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;You might think that it&#39;s okay because the JIT is so good. But we have<br>

&gt; &gt;&gt;&gt;&gt;another primitive to compare two bytes-objects. One which ought be<br>

&gt; &gt;&gt;&gt;&gt;slower than #primDigitCompare:, because it maps the bytes before doing<br>

&gt; &gt;&gt;&gt;&gt;the<br>

&gt; &gt;&gt;&gt;&gt;comparison. I even subtracted two from its result to match the result of<br>

&gt; &gt;&gt;&gt;&gt;#digitCompare:<br>

&gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;| order |<br>

&gt; &gt;&gt;&gt;&gt;order := (0 to: 255) as: ByteArray.<br>

&gt; &gt;&gt;&gt;&gt;[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:<br>

&gt; &gt;&gt;&gt;&gt;order) - 2 ] bench.<br>

&gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;But it&#39;s still about twice as quick as #primDigitCompare:.<br>

&gt; &gt;&gt;&gt;&gt;&#39;9,590,000 per second. 104 nanoseconds per run.&#39;.<br>

&gt; &gt;&gt;&gt;&gt;So, something must be wrong with #primDigitCompare:.<br>

&gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;Levente<br>

&gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;On Wed, 13 Apr 2016, David T. Lewis wrote:<br>

&gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;I would appreciate a review before moving this to trunk.<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;Background:<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;In UTCDateAndTime, most things are much faster than the trunk version.<br>

&gt; &gt;&gt;&gt;&gt;&gt;However, DataAndTime equality check did not improve for 32-bit images,<br>

&gt; &gt;&gt;&gt;&gt;&gt;so I ran it under AndreasSystemProfiler. Profiling showed that large<br>

&gt; &gt;&gt;&gt;&gt;&gt;integer equality checks spends time mostly in primDigitCompare, which<br>

&gt; &gt;&gt;&gt;&gt;&gt;is inefficient when only a simple byte comparison is needed.<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;Here is the performance difference that I see on my system, 32-bit<br>

&gt; &gt;&gt;&gt;&gt;&gt;trunk<br>

&gt; &gt;&gt;&gt;&gt;&gt;Spur on Linux:<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;| a b c d |<br>

&gt; &gt;&gt;&gt;&gt;&gt;a := 7432154326465436. &quot;a big number&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;b := a + 1. &quot;low order digit changed&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;c := 8432154326465436. &quot;high order digit changed&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;d := 7432154026465436. &quot;a digit in the middle changed&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;&quot;Base performance in the trunk image&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. &quot;==&gt; 63733&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. &quot;==&gt; 63152&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. &quot;==&gt; 63581&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;&quot;Performance after adding LargePositiveInteger&gt;&gt;=&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. &quot;==&gt; 4676&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. &quot;==&gt; 4883&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. &quot;==&gt; 4512&quot;<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;Dave<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;On Wed, Apr 13, 2016 at 10:57:28PM +0000, <a href="mailto:commits@source.squeak.org" target="_blank">commits@source.squeak.org</a><br>

&gt; &gt;&gt;&gt;&gt;&gt;wrote:<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;David T. Lewis uploaded a new version of Kernel to project The Inbox:<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;<a href="http://source.squeak.org/inbox/Kernel-dtl.1015.mcz" rel="noreferrer" target="_blank">http://source.squeak.org/inbox/Kernel-dtl.1015.mcz</a><br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;==================== Summary ====================<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;Name: Kernel-dtl.1015<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;Author: dtl<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;Time: 13 April 2016, 6:57:22.56608 pm<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;UUID: bd849f91-9b00-45c5-b2ab-891b420bde5e<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;Ancestors: Kernel-mt.1014<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;Make large integer equality test be about 13 times faster. Implement<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;#=<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;in LargePositiveInteger, and use digitAt: (primitive 60) for the<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;comparison.<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;=============== Diff against Kernel-mt.1014 ===============<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;Item was added:<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;+ ----- Method: LargePositiveInteger&gt;&gt;= (in category &#39;comparing&#39;)<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;-----<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;+ = aNumber<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;+<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;+       aNumber class == self class ifTrue: [<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;+               aNumber size = self size ifFalse: [ ^false ].<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;+               self size to: 1 by: -1 do: [ :i | (aNumber digitAt:<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;i) =<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;(self digitAt: i) ifFalse: [ ^ false ] ].<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;+               ^ true ].<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;+       aNumber isInteger ifTrue: [ ^false ].<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;+       aNumber isNumber ifFalse: [ ^false ].<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;+       ^aNumber adaptToInteger: self andCompare: #=!<br>

&gt; &gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;&gt;&gt;&gt;<br>

&gt; &gt;&gt;<br>

&gt; &gt;&gt;<br>

&gt; &gt;<br>

&gt; &gt;<br>

</div></div></blockquote></div><br></div></div></div></div></div></div></div></div></div></div></div></div>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div>