[Vm-dev] Re: primitiveDigitCompare is slow (was: Re: [squeak-dev] The Inbox: Kernel-dtl.1015.mcz)

Sat Apr 16 21:44:59 UTC 2016

Bingo,
here are the results without isKindOf checks:

[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
 '16,100,000 per second. 62.1 nanoseconds per run.'

that is more or less the speed of ByteArray comparison.

I think I will make a new pass on LargeIntegersPlugin :)

2016-04-16 22:54 GMT+02:00 Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com>:

> Ah but maybe the "Smart" side of the plugin is causing trouble:
>
>     firstInteger := self
>                 primitive: 'primDigitCompare'
>                 parameters: #(#Integer )
>                 receiver: #Integer.
>
> translates as:
>
>         success(isKindOf(stackValue(0), "Integer"));
>         secondInteger = stackValue(0);
>         /* missing DebugCode */;
>         success(isKindOf(stackValue(1), "Integer"));
>         firstInteger = stackValue(1);
>
> It might be faster to just check the 3 cases:
>
> (interpreterProxy isIntegerObject: oop) or:
>     [oopClass := interpreterProxy fetchClassOf: oop.
>      oopClass == interpreterProxy classLargeNegativeInteger or: [oopClass
> == interpreterProxy classLargePositiveInteger]].
>
> Moreover, we already test isIntegerObject further in the primitive code.
> Since every LargeIntegersPlugin primitive is going thru this isKindOf
> check, there might be some low hanging fruits.
>
> 2016-04-16 19:37 GMT+02:00 Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com>:
>
>> Hi,
>> the primitive does nothing very special
>> 1) check for SmallInteger cases first, for quick return
>> 2) check for LargeIntegers length then if both receiver & argument are
>> Large
>> 3) check for LargeIntegers digits then if both have same length
>>
>> None of these 3 steps is expected to be slow.
>>
>> A bleeding age VM does the 3rd step using 32 bits limbs instead of 8bits
>> but this does not change a thing about performances ratio, it could only
>> make a difference for giant integers, these ones fit on 56 bits...
>> I got these with 32bits Cog Sput MacOSX :
>>
>> [ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>>  '8,980,000 per second. 111 nanoseconds per run.'
>>
>> | order |
>> order := (0 to: 255) as: ByteArray.
>> [ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
>> order) - 2 ] bench.
>>  '16,400,000 per second. 60.8 nanoseconds per run.'
>>
>> Obviously, most time is spent elsewhere, it would be interesting to know
>> where exactly.
>>
>> Nicolas
>>
>>
>> 2016-04-16 18:52 GMT+02:00 David T. Lewis <lewis at mail.msen.com>:
>>
>>>
>>> It does seem odd that it would be different on Spur. You suggested that
>>> compiler optimization might be set differently for the two plugins. I
>>> cannot check (I'm on an Ubuntu laptop, cannot build Spur due to some
>>> kind of autotools problem that I can't figure out right now), but if you
>>> are able to do a VM compile and save the console output to a file, then
>>> I think you can look at the CFLAGS setting for the two plugins ($ nohup
>>> ./mvm,
>>> but first comment out the part of mvm that asks if you want to clean).
>>>
>>> Aside from that, the actual generated code for primDigitCompare is
>>> doing quite a lot of stuff before it ever gets around to comparing the
>>> digits. I suspect it is not a very efficient primitive.
>>>
>>> Dave
>>>
>>>
>>> On Fri, Apr 15, 2016 at 05:52:02PM +0200, Levente Uzonyi wrote:
>>> >
>>> > This thread got derailed, and I feel like we should go back to its
>>> > main point, which was that #digitCompare: using #primitiveDigitCompare
>>> is
>>> > slow.
>>> > It's so slow, that #primitiveCompareString is about twice as quick in
>>> > Spur VMs, which is odd.
>>> >
>>> > Levente
>>> >
>>> > On Thu, 14 Apr 2016, Levente Uzonyi wrote:
>>> >
>>> > >Hi Dave,
>>> > >
>>> > >I dag a bit deeper into this problem, and I found that it's been
>>> around
>>> > >for ages. I compared the two primitives in three older images.
>>> > >The first one is Squeak 4.2 running on Cog r2714:
>>> > >
>>> > >[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>>> > >'6,880,000 per second.'
>>> > >
>>> > >| order |
>>> > >order := (0 to: 255) as: ByteArray.
>>> > >[ (ByteString compare: 7432154326465436 with: 8432154326465436
>>> collated:
>>> > >order) - 2 ] bench.
>>> > >'11,200,000 per second.'
>>> > >
>>> > >The next one was an even older Cog VM running an image updated from
>>> > >3.10.2. The VM didn't have the primitives required to fetch the VM
>>> > >information. The result were '8.22459348130374e6 per second.' and
>>> > >'1.19677724910036e7 per second.', respectively.
>>> > >
>>> > >The last image was a closure-enabled 3.10.2 running on the classic
>>> > >Interpreter VM. The results were '3.911442911417717e6 per second.' and
>>> > >'4.84567866426715e6 per second.', respectively.
>>> > >
>>> > >Since in all VMs the seemingly more complex code
>>> (primitiveCompareString
>>> > >with a subtraction) was quicker than the simpler code
>>> > >(primitiveDigitCompare), I suspect that LargeIntegersPlugin is
>>> compiled
>>> > >with less aggressive optimization than MiscPrimitivePlugin.
>>> > >
>>> > >What's also interesting is that, based on these benchmarks, the
>>> > >performance got worse over the years. I think invoking a primitive
>>> has its
>>> > >cost in newer VMs.
>>> > >
>>> > >Levente
>>> > >
>>> > >On Thu, 14 Apr 2016, David T. Lewis wrote:
>>> > >
>>> > >>Oops, I just realized that my "interpreter VM" results were from a
>>> > >>debugging
>>> > >>VM with compiler optimization off (too many VMs, sorry). Here are the
>>> > >>results
>>> > >>for a more representative interpreter VM. I can't explain the
>>> variation,
>>> > >>but
>>> > >>one conclusion is clear: my suggested "optimization" in the inbox is
>>> a
>>> > >>bad idea
>>> > >>(and I will move it to the treated inbox).
>>> > >>
>>> > >> "Base performance in the trunk level V3 image with interpreter VM"
>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==>
>>> 26669"
>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==>
>>> 25052"
>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==>
>>> 25275"
>>> > >>
>>> > >> "Performance after adding LargePositiveInteger>>="
>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==>
>>> 59224"
>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]].  "==>
>>> 27824"
>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==>
>>> 44324"
>>> > >>
>>> > >>Dave
>>> > >>
>>> > >>
>>> > >>
>>> > >>On Wed, Apr 13, 2016 at 08:25:33PM -0400, David T. Lewis wrote:
>>> > >>>Hi Levente,
>>> > >>>
>>> > >>>I think you may be right. I repeated my test with an interpreter VM
>>> on a
>>> > >>>32-bit image (with loop count smaller because the interpreter VM is
>>> > >>>slower
>>> > >>>than Spur). The change that I put in the inbox does not have any
>>> benefit
>>> > >>>for the interpreter VM:
>>> > >>>
>>> > >>>
>>> > >>>  a := 7432154326465436. "a big number"
>>> > >>>  b := a + 1. "low order digit changed"
>>> > >>>  c := 8432154326465436. "high order digit changed"
>>> > >>>  d := 7432154026465436. "a digit in the middle changed"
>>> > >>>
>>> > >>>   "Base performance in the trunk level V3 image with interpreter
>>> VM"
>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==>
>>> 3844"
>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==>
>>> 3786"
>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]].. "==>
>>> 3800"
>>> > >>>
>>> > >>>  "Performance after adding LargePositiveInteger>>="
>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==>
>>> 3868"
>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==>
>>> 3775"
>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]]. "==>
>>> 3770"
>>> > >>>
>>> > >>>So yes it is something related to the VM. But I do not understand
>>> > >>>how #primDigitCompare could be so slow? As you say, maybe something
>>> > >>>is wrong with it.
>>> > >>>
>>> > >>>Thank you, I am glad that I asked for a review :-)
>>> > >>>
>>> > >>>Dave
>>> > >>>
>>> > >>>
>>> > >>>On Thu, Apr 14, 2016 at 01:45:42AM +0200, Levente Uzonyi wrote:
>>> > >>>>Hi Dave,
>>> > >>>>
>>> > >>>>I guess this is a VM related issue. On 64-bit Spur
>>> > >>>>[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>>> > >>>>returns '5,420,000 per second. 184 nanoseconds per run.'.
>>> > >>>>
>>> > >>>>Removing the primitive call from #digitCompare: the number goes up:
>>> > >>>>'20,200,000 per second. 49.4 nanoseconds per run.'.
>>> > >>>>
>>> > >>>>You might think that it's okay because the JIT is so good. But we
>>> have
>>> > >>>>another primitive to compare two bytes-objects. One which ought be
>>> > >>>>slower than #primDigitCompare:, because it maps the bytes before
>>> doing
>>> > >>>>the
>>> > >>>>comparison. I even subtracted two from its result to match the
>>> result of
>>> > >>>>#digitCompare:
>>> > >>>>
>>> > >>>>| order |
>>> > >>>>order := (0 to: 255) as: ByteArray.
>>> > >>>>[ (ByteString compare: 7432154326465436 with: 8432154326465436
>>> collated:
>>> > >>>>order) - 2 ] bench.
>>> > >>>>
>>> > >>>>But it's still about twice as quick as #primDigitCompare:.
>>> > >>>>'9,590,000 per second. 104 nanoseconds per run.'.
>>> > >>>>So, something must be wrong with #primDigitCompare:.
>>> > >>>>
>>> > >>>>Levente
>>> > >>>>
>>> > >>>>On Wed, 13 Apr 2016, David T. Lewis wrote:
>>> > >>>>
>>> > >>>>>I would appreciate a review before moving this to trunk.
>>> > >>>>>
>>> > >>>>>Background:
>>> > >>>>>
>>> > >>>>>In UTCDateAndTime, most things are much faster than the trunk
>>> version.
>>> > >>>>>However, DataAndTime equality check did not improve for 32-bit
>>> images,
>>> > >>>>>so I ran it under AndreasSystemProfiler. Profiling showed that
>>> large
>>> > >>>>>integer equality checks spends time mostly in primDigitCompare,
>>> which
>>> > >>>>>is inefficient when only a simple byte comparison is needed.
>>> > >>>>>
>>> > >>>>>Here is the performance difference that I see on my system, 32-bit
>>> > >>>>>trunk
>>> > >>>>>Spur on Linux:
>>> > >>>>>
>>> > >>>>>| a b c d |
>>> > >>>>>a := 7432154326465436. "a big number"
>>> > >>>>>b := a + 1. "low order digit changed"
>>> > >>>>>c := 8432154326465436. "high order digit changed"
>>> > >>>>>d := 7432154026465436. "a digit in the middle changed"
>>> > >>>>>
>>> > >>>>>"Base performance in the trunk image"
>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==>
>>> 63733"
>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==>
>>> 63152"
>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==>
>>> 63581"
>>> > >>>>>
>>> > >>>>>"Performance after adding LargePositiveInteger>>="
>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==>
>>> 4676"
>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==>
>>> 4883"
>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==>
>>> 4512"
>>> > >>>>>
>>> > >>>>>Dave
>>> > >>>>>
>>> > >>>>>
>>> > >>>>>
>>> > >>>>>On Wed, Apr 13, 2016 at 10:57:28PM +0000,
>>> commits at source.squeak.org
>>> > >>>>>wrote:
>>> > >>>>>>David T. Lewis uploaded a new version of Kernel to project The
>>> Inbox:
>>> > >>>>>>http://source.squeak.org/inbox/Kernel-dtl.1015.mcz
>>> > >>>>>>
>>> > >>>>>>==================== Summary ====================
>>> > >>>>>>
>>> > >>>>>>Name: Kernel-dtl.1015
>>> > >>>>>>Author: dtl
>>> > >>>>>>Time: 13 April 2016, 6:57:22.56608 pm
>>> > >>>>>>UUID: bd849f91-9b00-45c5-b2ab-891b420bde5e
>>> > >>>>>>Ancestors: Kernel-mt.1014
>>> > >>>>>>
>>> > >>>>>>Make large integer equality test be about 13 times faster.
>>> Implement
>>> > >>>>>>#=
>>> > >>>>>>in LargePositiveInteger, and use digitAt: (primitive 60) for the
>>> > >>>>>>comparison.
>>> > >>>>>>
>>> > >>>>>>=============== Diff against Kernel-mt.1014 ===============
>>> > >>>>>>
>>> > >>>>>>Item was added:
>>> > >>>>>>+ ----- Method: LargePositiveInteger>>= (in category 'comparing')
>>> > >>>>>>-----
>>> > >>>>>>+ = aNumber
>>> > >>>>>>+
>>> > >>>>>>+       aNumber class == self class ifTrue: [
>>> > >>>>>>+               aNumber size = self size ifFalse: [ ^false ].
>>> > >>>>>>+               self size to: 1 by: -1 do: [ :i | (aNumber
>>> digitAt:
>>> > >>>>>>i) =
>>> > >>>>>>(self digitAt: i) ifFalse: [ ^ false ] ].
>>> > >>>>>>+               ^ true ].
>>> > >>>>>>+       aNumber isInteger ifTrue: [ ^false ].
>>> > >>>>>>+       aNumber isNumber ifFalse: [ ^false ].
>>> > >>>>>>+       ^aNumber adaptToInteger: self andCompare: #=!
>>> > >>>>>>
>>> > >>>>>
>>> > >>>>>
>>> > >>
>>> > >>
>>> > >
>>> > >
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160416/ac2d7a14/attachment-0001.htm