[Vm-dev] primitiveDigitCompare is slow (was: Re: [squeak-dev] The Inbox: Kernel-dtl.1015.mcz)

Levente Uzonyi leves at caesar.elte.hu
Thu Apr 14 17:04:47 UTC 2016


Hi Dave,

I dag a bit deeper into this problem, and I found that it's been around 
for ages. I compared the two primitives in three older images.
The first one is Squeak 4.2 running on Cog r2714:

[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
'6,880,000 per second.'

| order |
order := (0 to: 255) as: ByteArray.
[ (ByteString compare: 7432154326465436 with: 8432154326465436 collated: order) - 2 ] bench.
'11,200,000 per second.'

The next one was an even older Cog VM running an image updated from 
3.10.2. The VM didn't have the primitives required to fetch the VM 
information. The result were '8.22459348130374e6 per second.' and
'1.19677724910036e7 per second.', respectively.

The last image was a closure-enabled 3.10.2 running on the classic 
Interpreter VM. The results were '3.911442911417717e6 per second.' and
'4.84567866426715e6 per second.', respectively.

Since in all VMs the seemingly more complex code (primitiveCompareString 
with a subtraction) was quicker than the simpler code 
(primitiveDigitCompare), I suspect that LargeIntegersPlugin is compiled
with less aggressive optimization than MiscPrimitivePlugin.

What's also interesting is that, based on these benchmarks, the 
performance got worse over the years. I think invoking a primitive has its 
cost in newer VMs.

Levente

On Thu, 14 Apr 2016, David T. Lewis wrote:

> Oops, I just realized that my "interpreter VM" results were from a debugging
> VM with compiler optimization off (too many VMs, sorry). Here are the results
> for a more representative interpreter VM. I can't explain the variation, but
> one conclusion is clear: my suggested "optimization" in the inbox is a bad idea
> (and I will move it to the treated inbox).
>
>  "Base performance in the trunk level V3 image with interpreter VM"
>  Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 26669"
>  Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==> 25052"
>  Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 25275"
>
>  "Performance after adding LargePositiveInteger>>="
>  Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 59224"
>  Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]].  "==> 27824"
>  Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 44324"
>
> Dave
>
>
>
> On Wed, Apr 13, 2016 at 08:25:33PM -0400, David T. Lewis wrote:
>> Hi Levente,
>>
>> I think you may be right. I repeated my test with an interpreter VM on a
>> 32-bit image (with loop count smaller because the interpreter VM is slower
>> than Spur). The change that I put in the inbox does not have any benefit
>> for the interpreter VM:
>>
>>
>>   a := 7432154326465436. "a big number"
>>   b := a + 1. "low order digit changed"
>>   c := 8432154326465436. "high order digit changed"
>>   d := 7432154026465436. "a digit in the middle changed"
>>
>>    "Base performance in the trunk level V3 image with interpreter VM"
>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3844"
>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3786"
>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]].. "==> 3800"
>>
>>   "Performance after adding LargePositiveInteger>>="
>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3868"
>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3775"
>>   Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]]. "==> 3770"
>>
>> So yes it is something related to the VM. But I do not understand
>> how #primDigitCompare could be so slow? As you say, maybe something
>> is wrong with it.
>>
>> Thank you, I am glad that I asked for a review :-)
>>
>> Dave
>>
>>
>> On Thu, Apr 14, 2016 at 01:45:42AM +0200, Levente Uzonyi wrote:
>>> Hi Dave,
>>>
>>> I guess this is a VM related issue. On 64-bit Spur
>>> [ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>>> returns '5,420,000 per second. 184 nanoseconds per run.'.
>>>
>>> Removing the primitive call from #digitCompare: the number goes up:
>>> '20,200,000 per second. 49.4 nanoseconds per run.'.
>>>
>>> You might think that it's okay because the JIT is so good. But we have
>>> another primitive to compare two bytes-objects. One which ought be
>>> slower than #primDigitCompare:, because it maps the bytes before doing the
>>> comparison. I even subtracted two from its result to match the result of
>>> #digitCompare:
>>>
>>> | order |
>>> order := (0 to: 255) as: ByteArray.
>>> [ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
>>> order) - 2 ] bench.
>>>
>>> But it's still about twice as quick as #primDigitCompare:.
>>> '9,590,000 per second. 104 nanoseconds per run.'.
>>> So, something must be wrong with #primDigitCompare:.
>>>
>>> Levente
>>>
>>> On Wed, 13 Apr 2016, David T. Lewis wrote:
>>>
>>>> I would appreciate a review before moving this to trunk.
>>>>
>>>> Background:
>>>>
>>>> In UTCDateAndTime, most things are much faster than the trunk version.
>>>> However, DataAndTime equality check did not improve for 32-bit images,
>>>> so I ran it under AndreasSystemProfiler. Profiling showed that large
>>>> integer equality checks spends time mostly in primDigitCompare, which
>>>> is inefficient when only a simple byte comparison is needed.
>>>>
>>>> Here is the performance difference that I see on my system, 32-bit trunk
>>>> Spur on Linux:
>>>>
>>>> | a b c d |
>>>> a := 7432154326465436. "a big number"
>>>> b := a + 1. "low order digit changed"
>>>> c := 8432154326465436. "high order digit changed"
>>>> d := 7432154026465436. "a digit in the middle changed"
>>>>
>>>> "Base performance in the trunk image"
>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 63733"
>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 63152"
>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 63581"
>>>>
>>>> "Performance after adding LargePositiveInteger>>="
>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 4676"
>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 4883"
>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 4512"
>>>>
>>>> Dave
>>>>
>>>>
>>>>
>>>> On Wed, Apr 13, 2016 at 10:57:28PM +0000, commits at source.squeak.org wrote:
>>>>> David T. Lewis uploaded a new version of Kernel to project The Inbox:
>>>>> http://source.squeak.org/inbox/Kernel-dtl.1015.mcz
>>>>>
>>>>> ==================== Summary ====================
>>>>>
>>>>> Name: Kernel-dtl.1015
>>>>> Author: dtl
>>>>> Time: 13 April 2016, 6:57:22.56608 pm
>>>>> UUID: bd849f91-9b00-45c5-b2ab-891b420bde5e
>>>>> Ancestors: Kernel-mt.1014
>>>>>
>>>>> Make large integer equality test be about 13 times faster. Implement #=
>>>>> in LargePositiveInteger, and use digitAt: (primitive 60) for the
>>>>> comparison.
>>>>>
>>>>> =============== Diff against Kernel-mt.1014 ===============
>>>>>
>>>>> Item was added:
>>>>> + ----- Method: LargePositiveInteger>>= (in category 'comparing') -----
>>>>> + = aNumber
>>>>> +
>>>>> + 	aNumber class == self class ifTrue: [
>>>>> + 		aNumber size = self size ifFalse: [ ^false ].
>>>>> + 		self size to: 1 by: -1 do: [ :i | (aNumber digitAt: i) =
>>>>> (self digitAt: i) ifFalse: [ ^ false ] ].
>>>>> + 		^ true ].
>>>>> + 	aNumber isInteger ifTrue: [ ^false ].
>>>>> + 	aNumber isNumber ifFalse: [ ^false ].
>>>>> + 	^aNumber adaptToInteger: self andCompare: #=!
>>>>>
>>>>
>>>>
>
>


More information about the Vm-dev mailing list