[Vm-dev] Re: primitiveDigitCompare is slow (was: Re: [squeak-dev] The Inbox: Kernel-dtl.1015.mcz)

Sat Apr 16 20:54:18 UTC 2016

Ah but maybe the "Smart" side of the plugin is causing trouble:

    firstInteger := self
                primitive: 'primDigitCompare'
                parameters: #(#Integer )
                receiver: #Integer.

translates as:

        success(isKindOf(stackValue(0), "Integer"));
        secondInteger = stackValue(0);
        /* missing DebugCode */;
        success(isKindOf(stackValue(1), "Integer"));
        firstInteger = stackValue(1);

It might be faster to just check the 3 cases:

(interpreterProxy isIntegerObject: oop) or:
    [oopClass := interpreterProxy fetchClassOf: oop.
     oopClass == interpreterProxy classLargeNegativeInteger or: [oopClass
== interpreterProxy classLargePositiveInteger]].

Moreover, we already test isIntegerObject further in the primitive code.
Since every LargeIntegersPlugin primitive is going thru this isKindOf
check, there might be some low hanging fruits.

2016-04-16 19:37 GMT+02:00 Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com>:

> Hi,
> the primitive does nothing very special
> 1) check for SmallInteger cases first, for quick return
> 2) check for LargeIntegers length then if both receiver & argument are
> Large
> 3) check for LargeIntegers digits then if both have same length
>
> None of these 3 steps is expected to be slow.
>
> A bleeding age VM does the 3rd step using 32 bits limbs instead of 8bits
> but this does not change a thing about performances ratio, it could only
> make a difference for giant integers, these ones fit on 56 bits...
> I got these with 32bits Cog Sput MacOSX :
>
> [ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>  '8,980,000 per second. 111 nanoseconds per run.'
>
> | order |
> order := (0 to: 255) as: ByteArray.
> [ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
> order) - 2 ] bench.
>  '16,400,000 per second. 60.8 nanoseconds per run.'
>
> Obviously, most time is spent elsewhere, it would be interesting to know
> where exactly.
>
> Nicolas
>
>
> 2016-04-16 18:52 GMT+02:00 David T. Lewis <lewis at mail.msen.com>:
>
>>
>> It does seem odd that it would be different on Spur. You suggested that
>> compiler optimization might be set differently for the two plugins. I
>> cannot check (I'm on an Ubuntu laptop, cannot build Spur due to some
>> kind of autotools problem that I can't figure out right now), but if you
>> are able to do a VM compile and save the console output to a file, then
>> I think you can look at the CFLAGS setting for the two plugins ($ nohup
>> ./mvm,
>> but first comment out the part of mvm that asks if you want to clean).
>>
>> Aside from that, the actual generated code for primDigitCompare is
>> doing quite a lot of stuff before it ever gets around to comparing the
>> digits. I suspect it is not a very efficient primitive.
>>
>> Dave
>>
>>
>> On Fri, Apr 15, 2016 at 05:52:02PM +0200, Levente Uzonyi wrote:
>> >
>> > This thread got derailed, and I feel like we should go back to its
>> > main point, which was that #digitCompare: using #primitiveDigitCompare
>> is
>> > slow.
>> > It's so slow, that #primitiveCompareString is about twice as quick in
>> > Spur VMs, which is odd.
>> >
>> > Levente
>> >
>> > On Thu, 14 Apr 2016, Levente Uzonyi wrote:
>> >
>> > >Hi Dave,
>> > >
>> > >I dag a bit deeper into this problem, and I found that it's been around
>> > >for ages. I compared the two primitives in three older images.
>> > >The first one is Squeak 4.2 running on Cog r2714:
>> > >
>> > >[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>> > >'6,880,000 per second.'
>> > >
>> > >| order |
>> > >order := (0 to: 255) as: ByteArray.
>> > >[ (ByteString compare: 7432154326465436 with: 8432154326465436
>> collated:
>> > >order) - 2 ] bench.
>> > >'11,200,000 per second.'
>> > >
>> > >The next one was an even older Cog VM running an image updated from
>> > >3.10.2. The VM didn't have the primitives required to fetch the VM
>> > >information. The result were '8.22459348130374e6 per second.' and
>> > >'1.19677724910036e7 per second.', respectively.
>> > >
>> > >The last image was a closure-enabled 3.10.2 running on the classic
>> > >Interpreter VM. The results were '3.911442911417717e6 per second.' and
>> > >'4.84567866426715e6 per second.', respectively.
>> > >
>> > >Since in all VMs the seemingly more complex code
>> (primitiveCompareString
>> > >with a subtraction) was quicker than the simpler code
>> > >(primitiveDigitCompare), I suspect that LargeIntegersPlugin is compiled
>> > >with less aggressive optimization than MiscPrimitivePlugin.
>> > >
>> > >What's also interesting is that, based on these benchmarks, the
>> > >performance got worse over the years. I think invoking a primitive has
>> its
>> > >cost in newer VMs.
>> > >
>> > >Levente
>> > >
>> > >On Thu, 14 Apr 2016, David T. Lewis wrote:
>> > >
>> > >>Oops, I just realized that my "interpreter VM" results were from a
>> > >>debugging
>> > >>VM with compiler optimization off (too many VMs, sorry). Here are the
>> > >>results
>> > >>for a more representative interpreter VM. I can't explain the
>> variation,
>> > >>but
>> > >>one conclusion is clear: my suggested "optimization" in the inbox is a
>> > >>bad idea
>> > >>(and I will move it to the treated inbox).
>> > >>
>> > >> "Base performance in the trunk level V3 image with interpreter VM"
>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==>
>> 26669"
>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==>
>> 25052"
>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==>
>> 25275"
>> > >>
>> > >> "Performance after adding LargePositiveInteger>>="
>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==>
>> 59224"
>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]].  "==>
>> 27824"
>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==>
>> 44324"
>> > >>
>> > >>Dave
>> > >>
>> > >>
>> > >>
>> > >>On Wed, Apr 13, 2016 at 08:25:33PM -0400, David T. Lewis wrote:
>> > >>>Hi Levente,
>> > >>>
>> > >>>I think you may be right. I repeated my test with an interpreter VM
>> on a
>> > >>>32-bit image (with loop count smaller because the interpreter VM is
>> > >>>slower
>> > >>>than Spur). The change that I put in the inbox does not have any
>> benefit
>> > >>>for the interpreter VM:
>> > >>>
>> > >>>
>> > >>>  a := 7432154326465436. "a big number"
>> > >>>  b := a + 1. "low order digit changed"
>> > >>>  c := 8432154326465436. "high order digit changed"
>> > >>>  d := 7432154026465436. "a digit in the middle changed"
>> > >>>
>> > >>>   "Base performance in the trunk level V3 image with interpreter VM"
>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==>
>> 3844"
>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==>
>> 3786"
>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]].. "==>
>> 3800"
>> > >>>
>> > >>>  "Performance after adding LargePositiveInteger>>="
>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==>
>> 3868"
>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==>
>> 3775"
>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]]. "==>
>> 3770"
>> > >>>
>> > >>>So yes it is something related to the VM. But I do not understand
>> > >>>how #primDigitCompare could be so slow? As you say, maybe something
>> > >>>is wrong with it.
>> > >>>
>> > >>>Thank you, I am glad that I asked for a review :-)
>> > >>>
>> > >>>Dave
>> > >>>
>> > >>>
>> > >>>On Thu, Apr 14, 2016 at 01:45:42AM +0200, Levente Uzonyi wrote:
>> > >>>>Hi Dave,
>> > >>>>
>> > >>>>I guess this is a VM related issue. On 64-bit Spur
>> > >>>>[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>> > >>>>returns '5,420,000 per second. 184 nanoseconds per run.'.
>> > >>>>
>> > >>>>Removing the primitive call from #digitCompare: the number goes up:
>> > >>>>'20,200,000 per second. 49.4 nanoseconds per run.'.
>> > >>>>
>> > >>>>You might think that it's okay because the JIT is so good. But we
>> have
>> > >>>>another primitive to compare two bytes-objects. One which ought be
>> > >>>>slower than #primDigitCompare:, because it maps the bytes before
>> doing
>> > >>>>the
>> > >>>>comparison. I even subtracted two from its result to match the
>> result of
>> > >>>>#digitCompare:
>> > >>>>
>> > >>>>| order |
>> > >>>>order := (0 to: 255) as: ByteArray.
>> > >>>>[ (ByteString compare: 7432154326465436 with: 8432154326465436
>> collated:
>> > >>>>order) - 2 ] bench.
>> > >>>>
>> > >>>>But it's still about twice as quick as #primDigitCompare:.
>> > >>>>'9,590,000 per second. 104 nanoseconds per run.'.
>> > >>>>So, something must be wrong with #primDigitCompare:.
>> > >>>>
>> > >>>>Levente
>> > >>>>
>> > >>>>On Wed, 13 Apr 2016, David T. Lewis wrote:
>> > >>>>
>> > >>>>>I would appreciate a review before moving this to trunk.
>> > >>>>>
>> > >>>>>Background:
>> > >>>>>
>> > >>>>>In UTCDateAndTime, most things are much faster than the trunk
>> version.
>> > >>>>>However, DataAndTime equality check did not improve for 32-bit
>> images,
>> > >>>>>so I ran it under AndreasSystemProfiler. Profiling showed that
>> large
>> > >>>>>integer equality checks spends time mostly in primDigitCompare,
>> which
>> > >>>>>is inefficient when only a simple byte comparison is needed.
>> > >>>>>
>> > >>>>>Here is the performance difference that I see on my system, 32-bit
>> > >>>>>trunk
>> > >>>>>Spur on Linux:
>> > >>>>>
>> > >>>>>| a b c d |
>> > >>>>>a := 7432154326465436. "a big number"
>> > >>>>>b := a + 1. "low order digit changed"
>> > >>>>>c := 8432154326465436. "high order digit changed"
>> > >>>>>d := 7432154026465436. "a digit in the middle changed"
>> > >>>>>
>> > >>>>>"Base performance in the trunk image"
>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==>
>> 63733"
>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==>
>> 63152"
>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==>
>> 63581"
>> > >>>>>
>> > >>>>>"Performance after adding LargePositiveInteger>>="
>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==>
>> 4676"
>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==>
>> 4883"
>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==>
>> 4512"
>> > >>>>>
>> > >>>>>Dave
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>>On Wed, Apr 13, 2016 at 10:57:28PM +0000,
>> commits at source.squeak.org
>> > >>>>>wrote:
>> > >>>>>>David T. Lewis uploaded a new version of Kernel to project The
>> Inbox:
>> > >>>>>>http://source.squeak.org/inbox/Kernel-dtl.1015.mcz
>> > >>>>>>
>> > >>>>>>==================== Summary ====================
>> > >>>>>>
>> > >>>>>>Name: Kernel-dtl.1015
>> > >>>>>>Author: dtl
>> > >>>>>>Time: 13 April 2016, 6:57:22.56608 pm
>> > >>>>>>UUID: bd849f91-9b00-45c5-b2ab-891b420bde5e
>> > >>>>>>Ancestors: Kernel-mt.1014
>> > >>>>>>
>> > >>>>>>Make large integer equality test be about 13 times faster.
>> Implement
>> > >>>>>>#=
>> > >>>>>>in LargePositiveInteger, and use digitAt: (primitive 60) for the
>> > >>>>>>comparison.
>> > >>>>>>
>> > >>>>>>=============== Diff against Kernel-mt.1014 ===============
>> > >>>>>>
>> > >>>>>>Item was added:
>> > >>>>>>+ ----- Method: LargePositiveInteger>>= (in category 'comparing')
>> > >>>>>>-----
>> > >>>>>>+ = aNumber
>> > >>>>>>+
>> > >>>>>>+       aNumber class == self class ifTrue: [
>> > >>>>>>+               aNumber size = self size ifFalse: [ ^false ].
>> > >>>>>>+               self size to: 1 by: -1 do: [ :i | (aNumber
>> digitAt:
>> > >>>>>>i) =
>> > >>>>>>(self digitAt: i) ifFalse: [ ^ false ] ].
>> > >>>>>>+               ^ true ].
>> > >>>>>>+       aNumber isInteger ifTrue: [ ^false ].
>> > >>>>>>+       aNumber isNumber ifFalse: [ ^false ].
>> > >>>>>>+       ^aNumber adaptToInteger: self andCompare: #=!
>> > >>>>>>
>> > >>>>>
>> > >>>>>
>> > >>
>> > >>
>> > >
>> > >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160416/2abf04c7/attachment-0001.htm