[Vm-dev] Re: primitiveDigitCompare is slow (was: Re: [squeak-dev] The Inbox: Kernel-dtl.1015.mcz)

Sat Apr 16 17:37:20 UTC 2016

Hi,
the primitive does nothing very special
1) check for SmallInteger cases first, for quick return
2) check for LargeIntegers length then if both receiver & argument are Large
3) check for LargeIntegers digits then if both have same length

None of these 3 steps is expected to be slow.

A bleeding age VM does the 3rd step using 32 bits limbs instead of 8bits
but this does not change a thing about performances ratio, it could only
make a difference for giant integers, these ones fit on 56 bits...
I got these with 32bits Cog Sput MacOSX :

[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
 '8,980,000 per second. 111 nanoseconds per run.'

| order |
order := (0 to: 255) as: ByteArray.
[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
order) - 2 ] bench.
 '16,400,000 per second. 60.8 nanoseconds per run.'

Obviously, most time is spent elsewhere, it would be interesting to know
where exactly.

Nicolas

2016-04-16 18:52 GMT+02:00 David T. Lewis <lewis at mail.msen.com>:

>
> It does seem odd that it would be different on Spur. You suggested that
> compiler optimization might be set differently for the two plugins. I
> cannot check (I'm on an Ubuntu laptop, cannot build Spur due to some
> kind of autotools problem that I can't figure out right now), but if you
> are able to do a VM compile and save the console output to a file, then
> I think you can look at the CFLAGS setting for the two plugins ($ nohup
> ./mvm,
> but first comment out the part of mvm that asks if you want to clean).
>
> Aside from that, the actual generated code for primDigitCompare is
> doing quite a lot of stuff before it ever gets around to comparing the
> digits. I suspect it is not a very efficient primitive.
>
> Dave
>
>
> On Fri, Apr 15, 2016 at 05:52:02PM +0200, Levente Uzonyi wrote:
> >
> > This thread got derailed, and I feel like we should go back to its
> > main point, which was that #digitCompare: using #primitiveDigitCompare is
> > slow.
> > It's so slow, that #primitiveCompareString is about twice as quick in
> > Spur VMs, which is odd.
> >
> > Levente
> >
> > On Thu, 14 Apr 2016, Levente Uzonyi wrote:
> >
> > >Hi Dave,
> > >
> > >I dag a bit deeper into this problem, and I found that it's been around
> > >for ages. I compared the two primitives in three older images.
> > >The first one is Squeak 4.2 running on Cog r2714:
> > >
> > >[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
> > >'6,880,000 per second.'
> > >
> > >| order |
> > >order := (0 to: 255) as: ByteArray.
> > >[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
> > >order) - 2 ] bench.
> > >'11,200,000 per second.'
> > >
> > >The next one was an even older Cog VM running an image updated from
> > >3.10.2. The VM didn't have the primitives required to fetch the VM
> > >information. The result were '8.22459348130374e6 per second.' and
> > >'1.19677724910036e7 per second.', respectively.
> > >
> > >The last image was a closure-enabled 3.10.2 running on the classic
> > >Interpreter VM. The results were '3.911442911417717e6 per second.' and
> > >'4.84567866426715e6 per second.', respectively.
> > >
> > >Since in all VMs the seemingly more complex code (primitiveCompareString
> > >with a subtraction) was quicker than the simpler code
> > >(primitiveDigitCompare), I suspect that LargeIntegersPlugin is compiled
> > >with less aggressive optimization than MiscPrimitivePlugin.
> > >
> > >What's also interesting is that, based on these benchmarks, the
> > >performance got worse over the years. I think invoking a primitive has
> its
> > >cost in newer VMs.
> > >
> > >Levente
> > >
> > >On Thu, 14 Apr 2016, David T. Lewis wrote:
> > >
> > >>Oops, I just realized that my "interpreter VM" results were from a
> > >>debugging
> > >>VM with compiler optimization off (too many VMs, sorry). Here are the
> > >>results
> > >>for a more representative interpreter VM. I can't explain the
> variation,
> > >>but
> > >>one conclusion is clear: my suggested "optimization" in the inbox is a
> > >>bad idea
> > >>(and I will move it to the treated inbox).
> > >>
> > >> "Base performance in the trunk level V3 image with interpreter VM"
> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==>
> 26669"
> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==>
> 25052"
> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==>
> 25275"
> > >>
> > >> "Performance after adding LargePositiveInteger>>="
> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==>
> 59224"
> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]].  "==>
> 27824"
> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==>
> 44324"
> > >>
> > >>Dave
> > >>
> > >>
> > >>
> > >>On Wed, Apr 13, 2016 at 08:25:33PM -0400, David T. Lewis wrote:
> > >>>Hi Levente,
> > >>>
> > >>>I think you may be right. I repeated my test with an interpreter VM
> on a
> > >>>32-bit image (with loop count smaller because the interpreter VM is
> > >>>slower
> > >>>than Spur). The change that I put in the inbox does not have any
> benefit
> > >>>for the interpreter VM:
> > >>>
> > >>>
> > >>>  a := 7432154326465436. "a big number"
> > >>>  b := a + 1. "low order digit changed"
> > >>>  c := 8432154326465436. "high order digit changed"
> > >>>  d := 7432154026465436. "a digit in the middle changed"
> > >>>
> > >>>   "Base performance in the trunk level V3 image with interpreter VM"
> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3844"
> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3786"
> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]].. "==>
> 3800"
> > >>>
> > >>>  "Performance after adding LargePositiveInteger>>="
> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3868"
> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3775"
> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]]. "==> 3770"
> > >>>
> > >>>So yes it is something related to the VM. But I do not understand
> > >>>how #primDigitCompare could be so slow? As you say, maybe something
> > >>>is wrong with it.
> > >>>
> > >>>Thank you, I am glad that I asked for a review :-)
> > >>>
> > >>>Dave
> > >>>
> > >>>
> > >>>On Thu, Apr 14, 2016 at 01:45:42AM +0200, Levente Uzonyi wrote:
> > >>>>Hi Dave,
> > >>>>
> > >>>>I guess this is a VM related issue. On 64-bit Spur
> > >>>>[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
> > >>>>returns '5,420,000 per second. 184 nanoseconds per run.'.
> > >>>>
> > >>>>Removing the primitive call from #digitCompare: the number goes up:
> > >>>>'20,200,000 per second. 49.4 nanoseconds per run.'.
> > >>>>
> > >>>>You might think that it's okay because the JIT is so good. But we
> have
> > >>>>another primitive to compare two bytes-objects. One which ought be
> > >>>>slower than #primDigitCompare:, because it maps the bytes before
> doing
> > >>>>the
> > >>>>comparison. I even subtracted two from its result to match the
> result of
> > >>>>#digitCompare:
> > >>>>
> > >>>>| order |
> > >>>>order := (0 to: 255) as: ByteArray.
> > >>>>[ (ByteString compare: 7432154326465436 with: 8432154326465436
> collated:
> > >>>>order) - 2 ] bench.
> > >>>>
> > >>>>But it's still about twice as quick as #primDigitCompare:.
> > >>>>'9,590,000 per second. 104 nanoseconds per run.'.
> > >>>>So, something must be wrong with #primDigitCompare:.
> > >>>>
> > >>>>Levente
> > >>>>
> > >>>>On Wed, 13 Apr 2016, David T. Lewis wrote:
> > >>>>
> > >>>>>I would appreciate a review before moving this to trunk.
> > >>>>>
> > >>>>>Background:
> > >>>>>
> > >>>>>In UTCDateAndTime, most things are much faster than the trunk
> version.
> > >>>>>However, DataAndTime equality check did not improve for 32-bit
> images,
> > >>>>>so I ran it under AndreasSystemProfiler. Profiling showed that large
> > >>>>>integer equality checks spends time mostly in primDigitCompare,
> which
> > >>>>>is inefficient when only a simple byte comparison is needed.
> > >>>>>
> > >>>>>Here is the performance difference that I see on my system, 32-bit
> > >>>>>trunk
> > >>>>>Spur on Linux:
> > >>>>>
> > >>>>>| a b c d |
> > >>>>>a := 7432154326465436. "a big number"
> > >>>>>b := a + 1. "low order digit changed"
> > >>>>>c := 8432154326465436. "high order digit changed"
> > >>>>>d := 7432154026465436. "a digit in the middle changed"
> > >>>>>
> > >>>>>"Base performance in the trunk image"
> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==>
> 63733"
> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==>
> 63152"
> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==>
> 63581"
> > >>>>>
> > >>>>>"Performance after adding LargePositiveInteger>>="
> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==>
> 4676"
> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==>
> 4883"
> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==>
> 4512"
> > >>>>>
> > >>>>>Dave
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>On Wed, Apr 13, 2016 at 10:57:28PM +0000, commits at source.squeak.org
> > >>>>>wrote:
> > >>>>>>David T. Lewis uploaded a new version of Kernel to project The
> Inbox:
> > >>>>>>http://source.squeak.org/inbox/Kernel-dtl.1015.mcz
> > >>>>>>
> > >>>>>>==================== Summary ====================
> > >>>>>>
> > >>>>>>Name: Kernel-dtl.1015
> > >>>>>>Author: dtl
> > >>>>>>Time: 13 April 2016, 6:57:22.56608 pm
> > >>>>>>UUID: bd849f91-9b00-45c5-b2ab-891b420bde5e
> > >>>>>>Ancestors: Kernel-mt.1014
> > >>>>>>
> > >>>>>>Make large integer equality test be about 13 times faster.
> Implement
> > >>>>>>#=
> > >>>>>>in LargePositiveInteger, and use digitAt: (primitive 60) for the
> > >>>>>>comparison.
> > >>>>>>
> > >>>>>>=============== Diff against Kernel-mt.1014 ===============
> > >>>>>>
> > >>>>>>Item was added:
> > >>>>>>+ ----- Method: LargePositiveInteger>>= (in category 'comparing')
> > >>>>>>-----
> > >>>>>>+ = aNumber
> > >>>>>>+
> > >>>>>>+       aNumber class == self class ifTrue: [
> > >>>>>>+               aNumber size = self size ifFalse: [ ^false ].
> > >>>>>>+               self size to: 1 by: -1 do: [ :i | (aNumber digitAt:
> > >>>>>>i) =
> > >>>>>>(self digitAt: i) ifFalse: [ ^ false ] ].
> > >>>>>>+               ^ true ].
> > >>>>>>+       aNumber isInteger ifTrue: [ ^false ].
> > >>>>>>+       aNumber isNumber ifFalse: [ ^false ].
> > >>>>>>+       ^aNumber adaptToInteger: self andCompare: #=!
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>
> > >>
> > >
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160416/6dfd999d/attachment-0001.htm