[Vm-dev] Re: primitiveDigitCompare is slow (was: Re: [squeak-dev] The Inbox: Kernel-dtl.1015.mcz)

Sat Apr 16 16:52:20 UTC 2016

It does seem odd that it would be different on Spur. You suggested that
compiler optimization might be set differently for the two plugins. I
cannot check (I'm on an Ubuntu laptop, cannot build Spur due to some
kind of autotools problem that I can't figure out right now), but if you
are able to do a VM compile and save the console output to a file, then
I think you can look at the CFLAGS setting for the two plugins ($ nohup ./mvm,
but first comment out the part of mvm that asks if you want to clean).

Aside from that, the actual generated code for primDigitCompare is
doing quite a lot of stuff before it ever gets around to comparing the
digits. I suspect it is not a very efficient primitive.

Dave

On Fri, Apr 15, 2016 at 05:52:02PM +0200, Levente Uzonyi wrote:
> 
> This thread got derailed, and I feel like we should go back to its 
> main point, which was that #digitCompare: using #primitiveDigitCompare is 
> slow. 
> It's so slow, that #primitiveCompareString is about twice as quick in 
> Spur VMs, which is odd.
> 
> Levente
> 
> On Thu, 14 Apr 2016, Levente Uzonyi wrote:
> 
> >Hi Dave,
> >
> >I dag a bit deeper into this problem, and I found that it's been around 
> >for ages. I compared the two primitives in three older images.
> >The first one is Squeak 4.2 running on Cog r2714:
> >
> >[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
> >'6,880,000 per second.'
> >
> >| order |
> >order := (0 to: 255) as: ByteArray.
> >[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated: 
> >order) - 2 ] bench.
> >'11,200,000 per second.'
> >
> >The next one was an even older Cog VM running an image updated from 
> >3.10.2. The VM didn't have the primitives required to fetch the VM 
> >information. The result were '8.22459348130374e6 per second.' and
> >'1.19677724910036e7 per second.', respectively.
> >
> >The last image was a closure-enabled 3.10.2 running on the classic 
> >Interpreter VM. The results were '3.911442911417717e6 per second.' and
> >'4.84567866426715e6 per second.', respectively.
> >
> >Since in all VMs the seemingly more complex code (primitiveCompareString 
> >with a subtraction) was quicker than the simpler code 
> >(primitiveDigitCompare), I suspect that LargeIntegersPlugin is compiled
> >with less aggressive optimization than MiscPrimitivePlugin.
> >
> >What's also interesting is that, based on these benchmarks, the 
> >performance got worse over the years. I think invoking a primitive has its 
> >cost in newer VMs.
> >
> >Levente
> >
> >On Thu, 14 Apr 2016, David T. Lewis wrote:
> >
> >>Oops, I just realized that my "interpreter VM" results were from a 
> >>debugging
> >>VM with compiler optimization off (too many VMs, sorry). Here are the 
> >>results
> >>for a more representative interpreter VM. I can't explain the variation, 
> >>but
> >>one conclusion is clear: my suggested "optimization" in the inbox is a 
> >>bad idea
> >>(and I will move it to the treated inbox).
> >>
> >> "Base performance in the trunk level V3 image with interpreter VM"
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 26669"
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==> 25052"
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 25275"
> >>
> >> "Performance after adding LargePositiveInteger>>="
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 59224"
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]].  "==> 27824"
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 44324"
> >>
> >>Dave
> >>
> >>
> >>
> >>On Wed, Apr 13, 2016 at 08:25:33PM -0400, David T. Lewis wrote:
> >>>Hi Levente,
> >>>
> >>>I think you may be right. I repeated my test with an interpreter VM on a
> >>>32-bit image (with loop count smaller because the interpreter VM is 
> >>>slower
> >>>than Spur). The change that I put in the inbox does not have any benefit
> >>>for the interpreter VM:
> >>>
> >>>
> >>>  a := 7432154326465436. "a big number"
> >>>  b := a + 1. "low order digit changed"
> >>>  c := 8432154326465436. "high order digit changed"
> >>>  d := 7432154026465436. "a digit in the middle changed"
> >>>
> >>>   "Base performance in the trunk level V3 image with interpreter VM"
> >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3844"
> >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3786"
> >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]].. "==> 3800"
> >>>
> >>>  "Performance after adding LargePositiveInteger>>="
> >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3868"
> >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3775"
> >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]]. "==> 3770"
> >>>
> >>>So yes it is something related to the VM. But I do not understand
> >>>how #primDigitCompare could be so slow? As you say, maybe something
> >>>is wrong with it.
> >>>
> >>>Thank you, I am glad that I asked for a review :-)
> >>>
> >>>Dave
> >>>
> >>>
> >>>On Thu, Apr 14, 2016 at 01:45:42AM +0200, Levente Uzonyi wrote:
> >>>>Hi Dave,
> >>>>
> >>>>I guess this is a VM related issue. On 64-bit Spur
> >>>>[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
> >>>>returns '5,420,000 per second. 184 nanoseconds per run.'.
> >>>>
> >>>>Removing the primitive call from #digitCompare: the number goes up:
> >>>>'20,200,000 per second. 49.4 nanoseconds per run.'.
> >>>>
> >>>>You might think that it's okay because the JIT is so good. But we have
> >>>>another primitive to compare two bytes-objects. One which ought be
> >>>>slower than #primDigitCompare:, because it maps the bytes before doing 
> >>>>the
> >>>>comparison. I even subtracted two from its result to match the result of
> >>>>#digitCompare:
> >>>>
> >>>>| order |
> >>>>order := (0 to: 255) as: ByteArray.
> >>>>[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
> >>>>order) - 2 ] bench.
> >>>>
> >>>>But it's still about twice as quick as #primDigitCompare:.
> >>>>'9,590,000 per second. 104 nanoseconds per run.'.
> >>>>So, something must be wrong with #primDigitCompare:.
> >>>>
> >>>>Levente
> >>>>
> >>>>On Wed, 13 Apr 2016, David T. Lewis wrote:
> >>>>
> >>>>>I would appreciate a review before moving this to trunk.
> >>>>>
> >>>>>Background:
> >>>>>
> >>>>>In UTCDateAndTime, most things are much faster than the trunk version.
> >>>>>However, DataAndTime equality check did not improve for 32-bit images,
> >>>>>so I ran it under AndreasSystemProfiler. Profiling showed that large
> >>>>>integer equality checks spends time mostly in primDigitCompare, which
> >>>>>is inefficient when only a simple byte comparison is needed.
> >>>>>
> >>>>>Here is the performance difference that I see on my system, 32-bit 
> >>>>>trunk
> >>>>>Spur on Linux:
> >>>>>
> >>>>>| a b c d |
> >>>>>a := 7432154326465436. "a big number"
> >>>>>b := a + 1. "low order digit changed"
> >>>>>c := 8432154326465436. "high order digit changed"
> >>>>>d := 7432154026465436. "a digit in the middle changed"
> >>>>>
> >>>>>"Base performance in the trunk image"
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 63733"
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 63152"
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 63581"
> >>>>>
> >>>>>"Performance after adding LargePositiveInteger>>="
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 4676"
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 4883"
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 4512"
> >>>>>
> >>>>>Dave
> >>>>>
> >>>>>
> >>>>>
> >>>>>On Wed, Apr 13, 2016 at 10:57:28PM +0000, commits at source.squeak.org 
> >>>>>wrote:
> >>>>>>David T. Lewis uploaded a new version of Kernel to project The Inbox:
> >>>>>>http://source.squeak.org/inbox/Kernel-dtl.1015.mcz
> >>>>>>
> >>>>>>==================== Summary ====================
> >>>>>>
> >>>>>>Name: Kernel-dtl.1015
> >>>>>>Author: dtl
> >>>>>>Time: 13 April 2016, 6:57:22.56608 pm
> >>>>>>UUID: bd849f91-9b00-45c5-b2ab-891b420bde5e
> >>>>>>Ancestors: Kernel-mt.1014
> >>>>>>
> >>>>>>Make large integer equality test be about 13 times faster. Implement 
> >>>>>>#=
> >>>>>>in LargePositiveInteger, and use digitAt: (primitive 60) for the
> >>>>>>comparison.
> >>>>>>
> >>>>>>=============== Diff against Kernel-mt.1014 ===============
> >>>>>>
> >>>>>>Item was added:
> >>>>>>+ ----- Method: LargePositiveInteger>>= (in category 'comparing') 
> >>>>>>-----
> >>>>>>+ = aNumber
> >>>>>>+
> >>>>>>+ 	aNumber class == self class ifTrue: [
> >>>>>>+ 		aNumber size = self size ifFalse: [ ^false ].
> >>>>>>+ 		self size to: 1 by: -1 do: [ :i | (aNumber digitAt: 
> >>>>>>i) =
> >>>>>>(self digitAt: i) ifFalse: [ ^ false ] ].
> >>>>>>+ 		^ true ].
> >>>>>>+ 	aNumber isInteger ifTrue: [ ^false ].
> >>>>>>+ 	aNumber isNumber ifFalse: [ ^false ].
> >>>>>>+ 	^aNumber adaptToInteger: self andCompare: #=!
> >>>>>>
> >>>>>
> >>>>>
> >>
> >>
> >
> >