[Vm-dev] Re: primitiveDigitCompare is slow (was: Re: [squeak-dev] The Inbox: Kernel-dtl.1015.mcz)

Fri Apr 15 15:52:02 UTC 2016

This thread got derailed, and I feel like we should go back to its 
main point, which was that #digitCompare: using #primitiveDigitCompare is 
slow. 
It's so slow, that #primitiveCompareString is about twice as quick in 
Spur VMs, which is odd.

Levente

On Thu, 14 Apr 2016, Levente Uzonyi wrote:

> Hi Dave,
>
> I dag a bit deeper into this problem, and I found that it's been around for 
> ages. I compared the two primitives in three older images.
> The first one is Squeak 4.2 running on Cog r2714:
>
> [ 7432154326465436 digitCompare: 8432154326465436 ] bench.
> '6,880,000 per second.'
>
> | order |
> order := (0 to: 255) as: ByteArray.
> [ (ByteString compare: 7432154326465436 with: 8432154326465436 collated: 
> order) - 2 ] bench.
> '11,200,000 per second.'
>
> The next one was an even older Cog VM running an image updated from 3.10.2. 
> The VM didn't have the primitives required to fetch the VM information. The 
> result were '8.22459348130374e6 per second.' and
> '1.19677724910036e7 per second.', respectively.
>
> The last image was a closure-enabled 3.10.2 running on the classic 
> Interpreter VM. The results were '3.911442911417717e6 per second.' and
> '4.84567866426715e6 per second.', respectively.
>
> Since in all VMs the seemingly more complex code (primitiveCompareString with 
> a subtraction) was quicker than the simpler code (primitiveDigitCompare), I 
> suspect that LargeIntegersPlugin is compiled
> with less aggressive optimization than MiscPrimitivePlugin.
>
> What's also interesting is that, based on these benchmarks, the performance 
> got worse over the years. I think invoking a primitive has its cost in newer 
> VMs.
>
> Levente
>
> On Thu, 14 Apr 2016, David T. Lewis wrote:
>
>> Oops, I just realized that my "interpreter VM" results were from a 
>> debugging
>> VM with compiler optimization off (too many VMs, sorry). Here are the 
>> results
>> for a more representative interpreter VM. I can't explain the variation, 
>> but
>> one conclusion is clear: my suggested "optimization" in the inbox is a bad 
>> idea
>> (and I will move it to the treated inbox).
>>
>>  "Base performance in the trunk level V3 image with interpreter VM"
>>  Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 26669"
>>  Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==> 25052"
>>  Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 25275"
>>
>>  "Performance after adding LargePositiveInteger>>="
>>  Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 59224"
>>  Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]].  "==> 27824"
>>  Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 44324"
>> 
>> Dave
>> 
>> 
>> 
>> On Wed, Apr 13, 2016 at 08:25:33PM -0400, David T. Lewis wrote:
>>> Hi Levente,
>>> 
>>> I think you may be right. I repeated my test with an interpreter VM on a
>>> 32-bit image (with loop count smaller because the interpreter VM is slower
>>> than Spur). The change that I put in the inbox does not have any benefit
>>> for the interpreter VM:
>>> 
>>>
>>>   a := 7432154326465436. "a big number"
>>>   b := a + 1. "low order digit changed"
>>>   c := 8432154326465436. "high order digit changed"
>>>   d := 7432154026465436. "a digit in the middle changed"
>>>
>>>    "Base performance in the trunk level V3 image with interpreter VM"
>>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3844"
>>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3786"
>>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]].. "==> 3800"
>>>
>>>   "Performance after adding LargePositiveInteger>>="
>>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3868"
>>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3775"
>>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]]. "==> 3770"
>>> 
>>> So yes it is something related to the VM. But I do not understand
>>> how #primDigitCompare could be so slow? As you say, maybe something
>>> is wrong with it.
>>> 
>>> Thank you, I am glad that I asked for a review :-)
>>> 
>>> Dave
>>> 
>>> 
>>> On Thu, Apr 14, 2016 at 01:45:42AM +0200, Levente Uzonyi wrote:
>>>> Hi Dave,
>>>> 
>>>> I guess this is a VM related issue. On 64-bit Spur
>>>> [ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>>>> returns '5,420,000 per second. 184 nanoseconds per run.'.
>>>> 
>>>> Removing the primitive call from #digitCompare: the number goes up:
>>>> '20,200,000 per second. 49.4 nanoseconds per run.'.
>>>> 
>>>> You might think that it's okay because the JIT is so good. But we have
>>>> another primitive to compare two bytes-objects. One which ought be
>>>> slower than #primDigitCompare:, because it maps the bytes before doing 
>>>> the
>>>> comparison. I even subtracted two from its result to match the result of
>>>> #digitCompare:
>>>> 
>>>> | order |
>>>> order := (0 to: 255) as: ByteArray.
>>>> [ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
>>>> order) - 2 ] bench.
>>>> 
>>>> But it's still about twice as quick as #primDigitCompare:.
>>>> '9,590,000 per second. 104 nanoseconds per run.'.
>>>> So, something must be wrong with #primDigitCompare:.
>>>> 
>>>> Levente
>>>> 
>>>> On Wed, 13 Apr 2016, David T. Lewis wrote:
>>>> 
>>>>> I would appreciate a review before moving this to trunk.
>>>>> 
>>>>> Background:
>>>>> 
>>>>> In UTCDateAndTime, most things are much faster than the trunk version.
>>>>> However, DataAndTime equality check did not improve for 32-bit images,
>>>>> so I ran it under AndreasSystemProfiler. Profiling showed that large
>>>>> integer equality checks spends time mostly in primDigitCompare, which
>>>>> is inefficient when only a simple byte comparison is needed.
>>>>> 
>>>>> Here is the performance difference that I see on my system, 32-bit trunk
>>>>> Spur on Linux:
>>>>> 
>>>>> | a b c d |
>>>>> a := 7432154326465436. "a big number"
>>>>> b := a + 1. "low order digit changed"
>>>>> c := 8432154326465436. "high order digit changed"
>>>>> d := 7432154026465436. "a digit in the middle changed"
>>>>> 
>>>>> "Base performance in the trunk image"
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 63733"
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 63152"
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 63581"
>>>>> 
>>>>> "Performance after adding LargePositiveInteger>>="
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 4676"
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 4883"
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 4512"
>>>>> 
>>>>> Dave
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Apr 13, 2016 at 10:57:28PM +0000, commits at source.squeak.org 
>>>>> wrote:
>>>>>> David T. Lewis uploaded a new version of Kernel to project The Inbox:
>>>>>> http://source.squeak.org/inbox/Kernel-dtl.1015.mcz
>>>>>> 
>>>>>> ==================== Summary ====================
>>>>>> 
>>>>>> Name: Kernel-dtl.1015
>>>>>> Author: dtl
>>>>>> Time: 13 April 2016, 6:57:22.56608 pm
>>>>>> UUID: bd849f91-9b00-45c5-b2ab-891b420bde5e
>>>>>> Ancestors: Kernel-mt.1014
>>>>>> 
>>>>>> Make large integer equality test be about 13 times faster. Implement #=
>>>>>> in LargePositiveInteger, and use digitAt: (primitive 60) for the
>>>>>> comparison.
>>>>>> 
>>>>>> =============== Diff against Kernel-mt.1014 ===============
>>>>>> 
>>>>>> Item was added:
>>>>>> + ----- Method: LargePositiveInteger>>= (in category 'comparing') -----
>>>>>> + = aNumber
>>>>>> +
>>>>>> + 	aNumber class == self class ifTrue: [
>>>>>> + 		aNumber size = self size ifFalse: [ ^false ].
>>>>>> + 		self size to: 1 by: -1 do: [ :i | (aNumber digitAt: 
>>>>>> i) =
>>>>>> (self digitAt: i) ifFalse: [ ^ false ] ].
>>>>>> + 		^ true ].
>>>>>> + 	aNumber isInteger ifTrue: [ ^false ].
>>>>>> + 	aNumber isNumber ifFalse: [ ^false ].
>>>>>> + 	^aNumber adaptToInteger: self andCompare: #=!
>>>>>> 
>>>>> 
>>>>> 
>> 
>> 
>
>