[Vm-dev] Re: primitiveDigitCompare is slow (was: Re: [squeak-dev]
The Inbox: Kernel-dtl.1015.mcz)
David T. Lewis
lewis at mail.msen.com
Sat Apr 16 16:52:20 UTC 2016
It does seem odd that it would be different on Spur. You suggested that
compiler optimization might be set differently for the two plugins. I
cannot check (I'm on an Ubuntu laptop, cannot build Spur due to some
kind of autotools problem that I can't figure out right now), but if you
are able to do a VM compile and save the console output to a file, then
I think you can look at the CFLAGS setting for the two plugins ($ nohup ./mvm,
but first comment out the part of mvm that asks if you want to clean).
Aside from that, the actual generated code for primDigitCompare is
doing quite a lot of stuff before it ever gets around to comparing the
digits. I suspect it is not a very efficient primitive.
Dave
On Fri, Apr 15, 2016 at 05:52:02PM +0200, Levente Uzonyi wrote:
>
> This thread got derailed, and I feel like we should go back to its
> main point, which was that #digitCompare: using #primitiveDigitCompare is
> slow.
> It's so slow, that #primitiveCompareString is about twice as quick in
> Spur VMs, which is odd.
>
> Levente
>
> On Thu, 14 Apr 2016, Levente Uzonyi wrote:
>
> >Hi Dave,
> >
> >I dag a bit deeper into this problem, and I found that it's been around
> >for ages. I compared the two primitives in three older images.
> >The first one is Squeak 4.2 running on Cog r2714:
> >
> >[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
> >'6,880,000 per second.'
> >
> >| order |
> >order := (0 to: 255) as: ByteArray.
> >[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
> >order) - 2 ] bench.
> >'11,200,000 per second.'
> >
> >The next one was an even older Cog VM running an image updated from
> >3.10.2. The VM didn't have the primitives required to fetch the VM
> >information. The result were '8.22459348130374e6 per second.' and
> >'1.19677724910036e7 per second.', respectively.
> >
> >The last image was a closure-enabled 3.10.2 running on the classic
> >Interpreter VM. The results were '3.911442911417717e6 per second.' and
> >'4.84567866426715e6 per second.', respectively.
> >
> >Since in all VMs the seemingly more complex code (primitiveCompareString
> >with a subtraction) was quicker than the simpler code
> >(primitiveDigitCompare), I suspect that LargeIntegersPlugin is compiled
> >with less aggressive optimization than MiscPrimitivePlugin.
> >
> >What's also interesting is that, based on these benchmarks, the
> >performance got worse over the years. I think invoking a primitive has its
> >cost in newer VMs.
> >
> >Levente
> >
> >On Thu, 14 Apr 2016, David T. Lewis wrote:
> >
> >>Oops, I just realized that my "interpreter VM" results were from a
> >>debugging
> >>VM with compiler optimization off (too many VMs, sorry). Here are the
> >>results
> >>for a more representative interpreter VM. I can't explain the variation,
> >>but
> >>one conclusion is clear: my suggested "optimization" in the inbox is a
> >>bad idea
> >>(and I will move it to the treated inbox).
> >>
> >> "Base performance in the trunk level V3 image with interpreter VM"
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 26669"
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==> 25052"
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 25275"
> >>
> >> "Performance after adding LargePositiveInteger>>="
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 59224"
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==> 27824"
> >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 44324"
> >>
> >>Dave
> >>
> >>
> >>
> >>On Wed, Apr 13, 2016 at 08:25:33PM -0400, David T. Lewis wrote:
> >>>Hi Levente,
> >>>
> >>>I think you may be right. I repeated my test with an interpreter VM on a
> >>>32-bit image (with loop count smaller because the interpreter VM is
> >>>slower
> >>>than Spur). The change that I put in the inbox does not have any benefit
> >>>for the interpreter VM:
> >>>
> >>>
> >>> a := 7432154326465436. "a big number"
> >>> b := a + 1. "low order digit changed"
> >>> c := 8432154326465436. "high order digit changed"
> >>> d := 7432154026465436. "a digit in the middle changed"
> >>>
> >>> "Base performance in the trunk level V3 image with interpreter VM"
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3844"
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3786"
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]].. "==> 3800"
> >>>
> >>> "Performance after adding LargePositiveInteger>>="
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3868"
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3775"
> >>> Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]]. "==> 3770"
> >>>
> >>>So yes it is something related to the VM. But I do not understand
> >>>how #primDigitCompare could be so slow? As you say, maybe something
> >>>is wrong with it.
> >>>
> >>>Thank you, I am glad that I asked for a review :-)
> >>>
> >>>Dave
> >>>
> >>>
> >>>On Thu, Apr 14, 2016 at 01:45:42AM +0200, Levente Uzonyi wrote:
> >>>>Hi Dave,
> >>>>
> >>>>I guess this is a VM related issue. On 64-bit Spur
> >>>>[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
> >>>>returns '5,420,000 per second. 184 nanoseconds per run.'.
> >>>>
> >>>>Removing the primitive call from #digitCompare: the number goes up:
> >>>>'20,200,000 per second. 49.4 nanoseconds per run.'.
> >>>>
> >>>>You might think that it's okay because the JIT is so good. But we have
> >>>>another primitive to compare two bytes-objects. One which ought be
> >>>>slower than #primDigitCompare:, because it maps the bytes before doing
> >>>>the
> >>>>comparison. I even subtracted two from its result to match the result of
> >>>>#digitCompare:
> >>>>
> >>>>| order |
> >>>>order := (0 to: 255) as: ByteArray.
> >>>>[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
> >>>>order) - 2 ] bench.
> >>>>
> >>>>But it's still about twice as quick as #primDigitCompare:.
> >>>>'9,590,000 per second. 104 nanoseconds per run.'.
> >>>>So, something must be wrong with #primDigitCompare:.
> >>>>
> >>>>Levente
> >>>>
> >>>>On Wed, 13 Apr 2016, David T. Lewis wrote:
> >>>>
> >>>>>I would appreciate a review before moving this to trunk.
> >>>>>
> >>>>>Background:
> >>>>>
> >>>>>In UTCDateAndTime, most things are much faster than the trunk version.
> >>>>>However, DataAndTime equality check did not improve for 32-bit images,
> >>>>>so I ran it under AndreasSystemProfiler. Profiling showed that large
> >>>>>integer equality checks spends time mostly in primDigitCompare, which
> >>>>>is inefficient when only a simple byte comparison is needed.
> >>>>>
> >>>>>Here is the performance difference that I see on my system, 32-bit
> >>>>>trunk
> >>>>>Spur on Linux:
> >>>>>
> >>>>>| a b c d |
> >>>>>a := 7432154326465436. "a big number"
> >>>>>b := a + 1. "low order digit changed"
> >>>>>c := 8432154326465436. "high order digit changed"
> >>>>>d := 7432154026465436. "a digit in the middle changed"
> >>>>>
> >>>>>"Base performance in the trunk image"
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 63733"
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 63152"
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 63581"
> >>>>>
> >>>>>"Performance after adding LargePositiveInteger>>="
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 4676"
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 4883"
> >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 4512"
> >>>>>
> >>>>>Dave
> >>>>>
> >>>>>
> >>>>>
> >>>>>On Wed, Apr 13, 2016 at 10:57:28PM +0000, commits at source.squeak.org
> >>>>>wrote:
> >>>>>>David T. Lewis uploaded a new version of Kernel to project The Inbox:
> >>>>>>http://source.squeak.org/inbox/Kernel-dtl.1015.mcz
> >>>>>>
> >>>>>>==================== Summary ====================
> >>>>>>
> >>>>>>Name: Kernel-dtl.1015
> >>>>>>Author: dtl
> >>>>>>Time: 13 April 2016, 6:57:22.56608 pm
> >>>>>>UUID: bd849f91-9b00-45c5-b2ab-891b420bde5e
> >>>>>>Ancestors: Kernel-mt.1014
> >>>>>>
> >>>>>>Make large integer equality test be about 13 times faster. Implement
> >>>>>>#=
> >>>>>>in LargePositiveInteger, and use digitAt: (primitive 60) for the
> >>>>>>comparison.
> >>>>>>
> >>>>>>=============== Diff against Kernel-mt.1014 ===============
> >>>>>>
> >>>>>>Item was added:
> >>>>>>+ ----- Method: LargePositiveInteger>>= (in category 'comparing')
> >>>>>>-----
> >>>>>>+ = aNumber
> >>>>>>+
> >>>>>>+ aNumber class == self class ifTrue: [
> >>>>>>+ aNumber size = self size ifFalse: [ ^false ].
> >>>>>>+ self size to: 1 by: -1 do: [ :i | (aNumber digitAt:
> >>>>>>i) =
> >>>>>>(self digitAt: i) ifFalse: [ ^ false ] ].
> >>>>>>+ ^ true ].
> >>>>>>+ aNumber isInteger ifTrue: [ ^false ].
> >>>>>>+ aNumber isNumber ifFalse: [ ^false ].
> >>>>>>+ ^aNumber adaptToInteger: self andCompare: #=!
> >>>>>>
> >>>>>
> >>>>>
> >>
> >>
> >
> >
More information about the Vm-dev
mailing list