<div dir="ltr"><div><div><div>Bingo,<br></div>here are the results without isKindOf checks:<br><br>[ 7432154326465436 digitCompare: 8432154326465436 ] bench.<br> '16,100,000 per second. 62.1 nanoseconds per run.'<br><br></div>that is more or less the speed of ByteArray comparison.<br><br></div>I think I will make a new pass on LargeIntegersPlugin :)<br></div><div class="gmail_extra"><br><div class="gmail_quote">2016-04-16 22:54 GMT+02:00 Nicolas Cellier <span dir="ltr"><<a href="mailto:nicolas.cellier.aka.nice@gmail.com" target="_blank">nicolas.cellier.aka.nice@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Ah but maybe the "Smart" side of the plugin is causing trouble:<br><br> firstInteger := self<br> primitive: 'primDigitCompare'<br> parameters: #(#Integer )<br> receiver: #Integer.<br><br>translates as:<br><br> success(isKindOf(stackValue(0), "Integer"));<br> secondInteger = stackValue(0);<br> /* missing DebugCode */;<br> success(isKindOf(stackValue(1), "Integer"));<br> firstInteger = stackValue(1);<br><br></div><div>It might be faster to just check the 3 cases:<br><br>(interpreterProxy isIntegerObject: oop) or:<br> [oopClass := interpreterProxy fetchClassOf: oop.<br> oopClass == interpreterProxy classLargeNegativeInteger or: [oopClass == interpreterProxy classLargePositiveInteger]].<br><br></div><div>Moreover, we already test isIntegerObject further in the primitive code.<br></div><div>Since every LargeIntegersPlugin primitive is going thru this isKindOf check, there might be some low hanging fruits.<br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">2016-04-16 19:37 GMT+02:00 Nicolas Cellier <span dir="ltr"><<a href="mailto:nicolas.cellier.aka.nice@gmail.com" target="_blank">nicolas.cellier.aka.nice@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><div>Hi,<br></div>the primitive does nothing very special<br></div>1) check for SmallInteger cases first, for quick return<br></div>2) check for LargeIntegers length then if both receiver & argument are Large<br></div>3) check for LargeIntegers digits then if both have same length<br><br></div><div>None of these 3 steps is expected to be slow.<br><br></div>A bleeding age VM does the 3rd step using 32 bits limbs instead of 8bits but this does not change a thing about performances ratio, it could only make a difference for giant integers, these ones fit on 56 bits...<br>I got these with 32bits Cog Sput MacOSX :<br><br>[ 7432154326465436 digitCompare: 8432154326465436 ] bench.<br> '8,980,000 per second. 111 nanoseconds per run.'<span><br><br>| order |<br>order := (0 to: 255) as: ByteArray.<br>[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated: order) - 2 ] bench.<br></span> '16,400,000 per second. 60.8 nanoseconds per run.'<br><br></div>Obviously, most time is spent elsewhere, it would be interesting to know where exactly.<span><font color="#888888"><br><br></font></span></div><span><font color="#888888">Nicolas</font></span><div><div><br><div><div><div><div><div><div><div><div><div class="gmail_extra"><br><div class="gmail_quote">2016-04-16 18:52 GMT+02:00 David T. Lewis <span dir="ltr"><<a href="mailto:lewis@mail.msen.com" target="_blank">lewis@mail.msen.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
It does seem odd that it would be different on Spur. You suggested that<br>
compiler optimization might be set differently for the two plugins. I<br>
cannot check (I'm on an Ubuntu laptop, cannot build Spur due to some<br>
kind of autotools problem that I can't figure out right now), but if you<br>
are able to do a VM compile and save the console output to a file, then<br>
I think you can look at the CFLAGS setting for the two plugins ($ nohup ./mvm,<br>
but first comment out the part of mvm that asks if you want to clean).<br>
<br>
Aside from that, the actual generated code for primDigitCompare is<br>
doing quite a lot of stuff before it ever gets around to comparing the<br>
digits. I suspect it is not a very efficient primitive.<br>
<br>
Dave<br>
<div><div><br>
<br>
On Fri, Apr 15, 2016 at 05:52:02PM +0200, Levente Uzonyi wrote:<br>
><br>
> This thread got derailed, and I feel like we should go back to its<br>
> main point, which was that #digitCompare: using #primitiveDigitCompare is<br>
> slow.<br>
> It's so slow, that #primitiveCompareString is about twice as quick in<br>
> Spur VMs, which is odd.<br>
><br>
> Levente<br>
><br>
> On Thu, 14 Apr 2016, Levente Uzonyi wrote:<br>
><br>
> >Hi Dave,<br>
> ><br>
> >I dag a bit deeper into this problem, and I found that it's been around<br>
> >for ages. I compared the two primitives in three older images.<br>
> >The first one is Squeak 4.2 running on Cog r2714:<br>
> ><br>
> >[ 7432154326465436 digitCompare: 8432154326465436 ] bench.<br>
> >'6,880,000 per second.'<br>
> ><br>
> >| order |<br>
> >order := (0 to: 255) as: ByteArray.<br>
> >[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:<br>
> >order) - 2 ] bench.<br>
> >'11,200,000 per second.'<br>
> ><br>
> >The next one was an even older Cog VM running an image updated from<br>
> >3.10.2. The VM didn't have the primitives required to fetch the VM<br>
> >information. The result were '8.22459348130374e6 per second.' and<br>
> >'1.19677724910036e7 per second.', respectively.<br>
> ><br>
> >The last image was a closure-enabled 3.10.2 running on the classic<br>
> >Interpreter VM. The results were '3.911442911417717e6 per second.' and<br>
> >'4.84567866426715e6 per second.', respectively.<br>
> ><br>
> >Since in all VMs the seemingly more complex code (primitiveCompareString<br>
> >with a subtraction) was quicker than the simpler code<br>
> >(primitiveDigitCompare), I suspect that LargeIntegersPlugin is compiled<br>
> >with less aggressive optimization than MiscPrimitivePlugin.<br>
> ><br>
> >What's also interesting is that, based on these benchmarks, the<br>
> >performance got worse over the years. I think invoking a primitive has its<br>
> >cost in newer VMs.<br>
> ><br>
> >Levente<br>
> ><br>
> >On Thu, 14 Apr 2016, David T. Lewis wrote:<br>
> ><br>
> >>Oops, I just realized that my "interpreter VM" results were from a<br>
> >>debugging<br>
> >>VM with compiler optimization off (too many VMs, sorry). Here are the<br>
> >>results<br>
> >>for a more representative interpreter VM. I can't explain the variation,<br>
> >>but<br>
> >>one conclusion is clear: my suggested "optimization" in the inbox is a<br>
> >>bad idea<br>
> >>(and I will move it to the treated inbox).<br>
> >><br>
> >> "Base performance in the trunk level V3 image with interpreter VM"<br>
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 26669"<br>
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==> 25052"<br>
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 25275"<br>
> >><br>
> >> "Performance after adding LargePositiveInteger>>="<br>
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 59224"<br>
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==> 27824"<br>
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 44324"<br>
> >><br>
> >>Dave<br>
> >><br>
> >><br>
> >><br>
> >>On Wed, Apr 13, 2016 at 08:25:33PM -0400, David T. Lewis wrote:<br>
> >>>Hi Levente,<br>
> >>><br>
> >>>I think you may be right. I repeated my test with an interpreter VM on a<br>
> >>>32-bit image (with loop count smaller because the interpreter VM is<br>
> >>>slower<br>
> >>>than Spur). The change that I put in the inbox does not have any benefit<br>
> >>>for the interpreter VM:<br>
> >>><br>
> >>><br>
> >>> a := 7432154326465436. "a big number"<br>
> >>> b := a + 1. "low order digit changed"<br>
> >>> c := 8432154326465436. "high order digit changed"<br>
> >>> d := 7432154026465436. "a digit in the middle changed"<br>
> >>><br>
> >>> "Base performance in the trunk level V3 image with interpreter VM"<br>
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3844"<br>
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3786"<br>
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]].. "==> 3800"<br>
> >>><br>
> >>> "Performance after adding LargePositiveInteger>>="<br>
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3868"<br>
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3775"<br>
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]]. "==> 3770"<br>
> >>><br>
> >>>So yes it is something related to the VM. But I do not understand<br>
> >>>how #primDigitCompare could be so slow? As you say, maybe something<br>
> >>>is wrong with it.<br>
> >>><br>
> >>>Thank you, I am glad that I asked for a review :-)<br>
> >>><br>
> >>>Dave<br>
> >>><br>
> >>><br>
> >>>On Thu, Apr 14, 2016 at 01:45:42AM +0200, Levente Uzonyi wrote:<br>
> >>>>Hi Dave,<br>
> >>>><br>
> >>>>I guess this is a VM related issue. On 64-bit Spur<br>
> >>>>[ 7432154326465436 digitCompare: 8432154326465436 ] bench.<br>
> >>>>returns '5,420,000 per second. 184 nanoseconds per run.'.<br>
> >>>><br>
> >>>>Removing the primitive call from #digitCompare: the number goes up:<br>
> >>>>'20,200,000 per second. 49.4 nanoseconds per run.'.<br>
> >>>><br>
> >>>>You might think that it's okay because the JIT is so good. But we have<br>
> >>>>another primitive to compare two bytes-objects. One which ought be<br>
> >>>>slower than #primDigitCompare:, because it maps the bytes before doing<br>
> >>>>the<br>
> >>>>comparison. I even subtracted two from its result to match the result of<br>
> >>>>#digitCompare:<br>
> >>>><br>
> >>>>| order |<br>
> >>>>order := (0 to: 255) as: ByteArray.<br>
> >>>>[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:<br>
> >>>>order) - 2 ] bench.<br>
> >>>><br>
> >>>>But it's still about twice as quick as #primDigitCompare:.<br>
> >>>>'9,590,000 per second. 104 nanoseconds per run.'.<br>
> >>>>So, something must be wrong with #primDigitCompare:.<br>
> >>>><br>
> >>>>Levente<br>
> >>>><br>
> >>>>On Wed, 13 Apr 2016, David T. Lewis wrote:<br>
> >>>><br>
> >>>>>I would appreciate a review before moving this to trunk.<br>
> >>>>><br>
> >>>>>Background:<br>
> >>>>><br>
> >>>>>In UTCDateAndTime, most things are much faster than the trunk version.<br>
> >>>>>However, DataAndTime equality check did not improve for 32-bit images,<br>
> >>>>>so I ran it under AndreasSystemProfiler. Profiling showed that large<br>
> >>>>>integer equality checks spends time mostly in primDigitCompare, which<br>
> >>>>>is inefficient when only a simple byte comparison is needed.<br>
> >>>>><br>
> >>>>>Here is the performance difference that I see on my system, 32-bit<br>
> >>>>>trunk<br>
> >>>>>Spur on Linux:<br>
> >>>>><br>
> >>>>>| a b c d |<br>
> >>>>>a := 7432154326465436. "a big number"<br>
> >>>>>b := a + 1. "low order digit changed"<br>
> >>>>>c := 8432154326465436. "high order digit changed"<br>
> >>>>>d := 7432154026465436. "a digit in the middle changed"<br>
> >>>>><br>
> >>>>>"Base performance in the trunk image"<br>
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 63733"<br>
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 63152"<br>
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 63581"<br>
> >>>>><br>
> >>>>>"Performance after adding LargePositiveInteger>>="<br>
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 4676"<br>
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 4883"<br>
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 4512"<br>
> >>>>><br>
> >>>>>Dave<br>
> >>>>><br>
> >>>>><br>
> >>>>><br>
> >>>>>On Wed, Apr 13, 2016 at 10:57:28PM +0000, <a href="mailto:commits@source.squeak.org" target="_blank">commits@source.squeak.org</a><br>
> >>>>>wrote:<br>
> >>>>>>David T. Lewis uploaded a new version of Kernel to project The Inbox:<br>
> >>>>>><a href="http://source.squeak.org/inbox/Kernel-dtl.1015.mcz" rel="noreferrer" target="_blank">http://source.squeak.org/inbox/Kernel-dtl.1015.mcz</a><br>
> >>>>>><br>
> >>>>>>==================== Summary ====================<br>
> >>>>>><br>
> >>>>>>Name: Kernel-dtl.1015<br>
> >>>>>>Author: dtl<br>
> >>>>>>Time: 13 April 2016, 6:57:22.56608 pm<br>
> >>>>>>UUID: bd849f91-9b00-45c5-b2ab-891b420bde5e<br>
> >>>>>>Ancestors: Kernel-mt.1014<br>
> >>>>>><br>
> >>>>>>Make large integer equality test be about 13 times faster. Implement<br>
> >>>>>>#=<br>
> >>>>>>in LargePositiveInteger, and use digitAt: (primitive 60) for the<br>
> >>>>>>comparison.<br>
> >>>>>><br>
> >>>>>>=============== Diff against Kernel-mt.1014 ===============<br>
> >>>>>><br>
> >>>>>>Item was added:<br>
> >>>>>>+ ----- Method: LargePositiveInteger>>= (in category 'comparing')<br>
> >>>>>>-----<br>
> >>>>>>+ = aNumber<br>
> >>>>>>+<br>
> >>>>>>+ aNumber class == self class ifTrue: [<br>
> >>>>>>+ aNumber size = self size ifFalse: [ ^false ].<br>
> >>>>>>+ self size to: 1 by: -1 do: [ :i | (aNumber digitAt:<br>
> >>>>>>i) =<br>
> >>>>>>(self digitAt: i) ifFalse: [ ^ false ] ].<br>
> >>>>>>+ ^ true ].<br>
> >>>>>>+ aNumber isInteger ifTrue: [ ^false ].<br>
> >>>>>>+ aNumber isNumber ifFalse: [ ^false ].<br>
> >>>>>>+ ^aNumber adaptToInteger: self andCompare: #=!<br>
> >>>>>><br>
> >>>>><br>
> >>>>><br>
> >><br>
> >><br>
> ><br>
> ><br>
</div></div></blockquote></div><br></div></div></div></div></div></div></div></div></div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>