[Vm-dev] Re: primitiveDigitCompare is slow (was: Re: [squeak-dev]
The Inbox: Kernel-dtl.1015.mcz)
Levente Uzonyi
leves at caesar.elte.hu
Fri Apr 15 15:52:02 UTC 2016
This thread got derailed, and I feel like we should go back to its
main point, which was that #digitCompare: using #primitiveDigitCompare is
slow.
It's so slow, that #primitiveCompareString is about twice as quick in
Spur VMs, which is odd.
Levente
On Thu, 14 Apr 2016, Levente Uzonyi wrote:
> Hi Dave,
>
> I dag a bit deeper into this problem, and I found that it's been around for
> ages. I compared the two primitives in three older images.
> The first one is Squeak 4.2 running on Cog r2714:
>
> [ 7432154326465436 digitCompare: 8432154326465436 ] bench.
> '6,880,000 per second.'
>
> | order |
> order := (0 to: 255) as: ByteArray.
> [ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
> order) - 2 ] bench.
> '11,200,000 per second.'
>
> The next one was an even older Cog VM running an image updated from 3.10.2.
> The VM didn't have the primitives required to fetch the VM information. The
> result were '8.22459348130374e6 per second.' and
> '1.19677724910036e7 per second.', respectively.
>
> The last image was a closure-enabled 3.10.2 running on the classic
> Interpreter VM. The results were '3.911442911417717e6 per second.' and
> '4.84567866426715e6 per second.', respectively.
>
> Since in all VMs the seemingly more complex code (primitiveCompareString with
> a subtraction) was quicker than the simpler code (primitiveDigitCompare), I
> suspect that LargeIntegersPlugin is compiled
> with less aggressive optimization than MiscPrimitivePlugin.
>
> What's also interesting is that, based on these benchmarks, the performance
> got worse over the years. I think invoking a primitive has its cost in newer
> VMs.
>
> Levente
>
> On Thu, 14 Apr 2016, David T. Lewis wrote:
>
>> Oops, I just realized that my "interpreter VM" results were from a
>> debugging
>> VM with compiler optimization off (too many VMs, sorry). Here are the
>> results
>> for a more representative interpreter VM. I can't explain the variation,
>> but
>> one conclusion is clear: my suggested "optimization" in the inbox is a bad
>> idea
>> (and I will move it to the treated inbox).
>>
>> "Base performance in the trunk level V3 image with interpreter VM"
>> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 26669"
>> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==> 25052"
>> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 25275"
>>
>> "Performance after adding LargePositiveInteger>>="
>> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==> 59224"
>> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==> 27824"
>> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==> 44324"
>>
>> Dave
>>
>>
>>
>> On Wed, Apr 13, 2016 at 08:25:33PM -0400, David T. Lewis wrote:
>>> Hi Levente,
>>>
>>> I think you may be right. I repeated my test with an interpreter VM on a
>>> 32-bit image (with loop count smaller because the interpreter VM is slower
>>> than Spur). The change that I put in the inbox does not have any benefit
>>> for the interpreter VM:
>>>
>>>
>>> a := 7432154326465436. "a big number"
>>> b := a + 1. "low order digit changed"
>>> c := 8432154326465436. "high order digit changed"
>>> d := 7432154026465436. "a digit in the middle changed"
>>>
>>> "Base performance in the trunk level V3 image with interpreter VM"
>>> Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3844"
>>> Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3786"
>>> Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]].. "==> 3800"
>>>
>>> "Performance after adding LargePositiveInteger>>="
>>> Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==> 3868"
>>> Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==> 3775"
>>> Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]]. "==> 3770"
>>>
>>> So yes it is something related to the VM. But I do not understand
>>> how #primDigitCompare could be so slow? As you say, maybe something
>>> is wrong with it.
>>>
>>> Thank you, I am glad that I asked for a review :-)
>>>
>>> Dave
>>>
>>>
>>> On Thu, Apr 14, 2016 at 01:45:42AM +0200, Levente Uzonyi wrote:
>>>> Hi Dave,
>>>>
>>>> I guess this is a VM related issue. On 64-bit Spur
>>>> [ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>>>> returns '5,420,000 per second. 184 nanoseconds per run.'.
>>>>
>>>> Removing the primitive call from #digitCompare: the number goes up:
>>>> '20,200,000 per second. 49.4 nanoseconds per run.'.
>>>>
>>>> You might think that it's okay because the JIT is so good. But we have
>>>> another primitive to compare two bytes-objects. One which ought be
>>>> slower than #primDigitCompare:, because it maps the bytes before doing
>>>> the
>>>> comparison. I even subtracted two from its result to match the result of
>>>> #digitCompare:
>>>>
>>>> | order |
>>>> order := (0 to: 255) as: ByteArray.
>>>> [ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
>>>> order) - 2 ] bench.
>>>>
>>>> But it's still about twice as quick as #primDigitCompare:.
>>>> '9,590,000 per second. 104 nanoseconds per run.'.
>>>> So, something must be wrong with #primDigitCompare:.
>>>>
>>>> Levente
>>>>
>>>> On Wed, 13 Apr 2016, David T. Lewis wrote:
>>>>
>>>>> I would appreciate a review before moving this to trunk.
>>>>>
>>>>> Background:
>>>>>
>>>>> In UTCDateAndTime, most things are much faster than the trunk version.
>>>>> However, DataAndTime equality check did not improve for 32-bit images,
>>>>> so I ran it under AndreasSystemProfiler. Profiling showed that large
>>>>> integer equality checks spends time mostly in primDigitCompare, which
>>>>> is inefficient when only a simple byte comparison is needed.
>>>>>
>>>>> Here is the performance difference that I see on my system, 32-bit trunk
>>>>> Spur on Linux:
>>>>>
>>>>> | a b c d |
>>>>> a := 7432154326465436. "a big number"
>>>>> b := a + 1. "low order digit changed"
>>>>> c := 8432154326465436. "high order digit changed"
>>>>> d := 7432154026465436. "a digit in the middle changed"
>>>>>
>>>>> "Base performance in the trunk image"
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 63733"
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 63152"
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 63581"
>>>>>
>>>>> "Performance after adding LargePositiveInteger>>="
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==> 4676"
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==> 4883"
>>>>> Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==> 4512"
>>>>>
>>>>> Dave
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 13, 2016 at 10:57:28PM +0000, commits at source.squeak.org
>>>>> wrote:
>>>>>> David T. Lewis uploaded a new version of Kernel to project The Inbox:
>>>>>> http://source.squeak.org/inbox/Kernel-dtl.1015.mcz
>>>>>>
>>>>>> ==================== Summary ====================
>>>>>>
>>>>>> Name: Kernel-dtl.1015
>>>>>> Author: dtl
>>>>>> Time: 13 April 2016, 6:57:22.56608 pm
>>>>>> UUID: bd849f91-9b00-45c5-b2ab-891b420bde5e
>>>>>> Ancestors: Kernel-mt.1014
>>>>>>
>>>>>> Make large integer equality test be about 13 times faster. Implement #=
>>>>>> in LargePositiveInteger, and use digitAt: (primitive 60) for the
>>>>>> comparison.
>>>>>>
>>>>>> =============== Diff against Kernel-mt.1014 ===============
>>>>>>
>>>>>> Item was added:
>>>>>> + ----- Method: LargePositiveInteger>>= (in category 'comparing') -----
>>>>>> + = aNumber
>>>>>> +
>>>>>> + aNumber class == self class ifTrue: [
>>>>>> + aNumber size = self size ifFalse: [ ^false ].
>>>>>> + self size to: 1 by: -1 do: [ :i | (aNumber digitAt:
>>>>>> i) =
>>>>>> (self digitAt: i) ifFalse: [ ^ false ] ].
>>>>>> + ^ true ].
>>>>>> + aNumber isInteger ifTrue: [ ^false ].
>>>>>> + aNumber isNumber ifFalse: [ ^false ].
>>>>>> + ^aNumber adaptToInteger: self andCompare: #=!
>>>>>>
>>>>>
>>>>>
>>
>>
>
>
More information about the Vm-dev
mailing list