[Vm-dev] Re: primitiveDigitCompare is slow (was: Re: [squeak-dev] The Inbox: Kernel-dtl.1015.mcz)

Eliot Miranda eliot.miranda at gmail.com
Sat Apr 16 21:54:23 UTC 2016


Hi Nicolas,

   IMO we need a different API for checking classes, something that via
macros compiles to direct class access for internal plugins.

I was also going to suggest using the VM Profiler:

"The VMProfiler will tell you exactly where:
 http://source.squeak.org/VMMaker/CogTools
then screen menu open... VM Profiler
then put an expression in the bottom right hand box, and hit  Profile,
then in the left=hand list of function names, select VM report"

But the UnixOSProcessPlugin as compiled by my mac build scripts and run
under Spur on Mac OS X does not appear to load.  The VM Profiler depends on
OS Process to extract addresses from shared libraries.  I'm looking at it
now :-(


On Sat, Apr 16, 2016 at 2:44 PM, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:

>
> Bingo,
> here are the results without isKindOf checks:
>
> [ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>  '16,100,000 per second. 62.1 nanoseconds per run.'
>
> that is more or less the speed of ByteArray comparison.
>
> I think I will make a new pass on LargeIntegersPlugin :)
>
> 2016-04-16 22:54 GMT+02:00 Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com>:
>
>> Ah but maybe the "Smart" side of the plugin is causing trouble:
>>
>>     firstInteger := self
>>                 primitive: 'primDigitCompare'
>>                 parameters: #(#Integer )
>>                 receiver: #Integer.
>>
>> translates as:
>>
>>         success(isKindOf(stackValue(0), "Integer"));
>>         secondInteger = stackValue(0);
>>         /* missing DebugCode */;
>>         success(isKindOf(stackValue(1), "Integer"));
>>         firstInteger = stackValue(1);
>>
>> It might be faster to just check the 3 cases:
>>
>> (interpreterProxy isIntegerObject: oop) or:
>>     [oopClass := interpreterProxy fetchClassOf: oop.
>>      oopClass == interpreterProxy classLargeNegativeInteger or: [oopClass
>> == interpreterProxy classLargePositiveInteger]].
>>
>> Moreover, we already test isIntegerObject further in the primitive code.
>> Since every LargeIntegersPlugin primitive is going thru this isKindOf
>> check, there might be some low hanging fruits.
>>
>> 2016-04-16 19:37 GMT+02:00 Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com>:
>>
>>> Hi,
>>> the primitive does nothing very special
>>> 1) check for SmallInteger cases first, for quick return
>>> 2) check for LargeIntegers length then if both receiver & argument are
>>> Large
>>> 3) check for LargeIntegers digits then if both have same length
>>>
>>> None of these 3 steps is expected to be slow.
>>>
>>> A bleeding age VM does the 3rd step using 32 bits limbs instead of 8bits
>>> but this does not change a thing about performances ratio, it could only
>>> make a difference for giant integers, these ones fit on 56 bits...
>>> I got these with 32bits Cog Sput MacOSX :
>>>
>>> [ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>>>  '8,980,000 per second. 111 nanoseconds per run.'
>>>
>>> | order |
>>> order := (0 to: 255) as: ByteArray.
>>> [ (ByteString compare: 7432154326465436 with: 8432154326465436 collated:
>>> order) - 2 ] bench.
>>>  '16,400,000 per second. 60.8 nanoseconds per run.'
>>>
>>> Obviously, most time is spent elsewhere, it would be interesting to know
>>> where exactly.
>>>
>>> Nicolas
>>>
>>>
>>> 2016-04-16 18:52 GMT+02:00 David T. Lewis <lewis at mail.msen.com>:
>>>
>>>>
>>>> It does seem odd that it would be different on Spur. You suggested that
>>>> compiler optimization might be set differently for the two plugins. I
>>>> cannot check (I'm on an Ubuntu laptop, cannot build Spur due to some
>>>> kind of autotools problem that I can't figure out right now), but if you
>>>> are able to do a VM compile and save the console output to a file, then
>>>> I think you can look at the CFLAGS setting for the two plugins ($ nohup
>>>> ./mvm,
>>>> but first comment out the part of mvm that asks if you want to clean).
>>>>
>>>> Aside from that, the actual generated code for primDigitCompare is
>>>> doing quite a lot of stuff before it ever gets around to comparing the
>>>> digits. I suspect it is not a very efficient primitive.
>>>>
>>>> Dave
>>>>
>>>>
>>>> On Fri, Apr 15, 2016 at 05:52:02PM +0200, Levente Uzonyi wrote:
>>>> >
>>>> > This thread got derailed, and I feel like we should go back to its
>>>> > main point, which was that #digitCompare: using
>>>> #primitiveDigitCompare is
>>>> > slow.
>>>> > It's so slow, that #primitiveCompareString is about twice as quick in
>>>> > Spur VMs, which is odd.
>>>> >
>>>> > Levente
>>>> >
>>>> > On Thu, 14 Apr 2016, Levente Uzonyi wrote:
>>>> >
>>>> > >Hi Dave,
>>>> > >
>>>> > >I dag a bit deeper into this problem, and I found that it's been
>>>> around
>>>> > >for ages. I compared the two primitives in three older images.
>>>> > >The first one is Squeak 4.2 running on Cog r2714:
>>>> > >
>>>> > >[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>>>> > >'6,880,000 per second.'
>>>> > >
>>>> > >| order |
>>>> > >order := (0 to: 255) as: ByteArray.
>>>> > >[ (ByteString compare: 7432154326465436 with: 8432154326465436
>>>> collated:
>>>> > >order) - 2 ] bench.
>>>> > >'11,200,000 per second.'
>>>> > >
>>>> > >The next one was an even older Cog VM running an image updated from
>>>> > >3.10.2. The VM didn't have the primitives required to fetch the VM
>>>> > >information. The result were '8.22459348130374e6 per second.' and
>>>> > >'1.19677724910036e7 per second.', respectively.
>>>> > >
>>>> > >The last image was a closure-enabled 3.10.2 running on the classic
>>>> > >Interpreter VM. The results were '3.911442911417717e6 per second.'
>>>> and
>>>> > >'4.84567866426715e6 per second.', respectively.
>>>> > >
>>>> > >Since in all VMs the seemingly more complex code
>>>> (primitiveCompareString
>>>> > >with a subtraction) was quicker than the simpler code
>>>> > >(primitiveDigitCompare), I suspect that LargeIntegersPlugin is
>>>> compiled
>>>> > >with less aggressive optimization than MiscPrimitivePlugin.
>>>> > >
>>>> > >What's also interesting is that, based on these benchmarks, the
>>>> > >performance got worse over the years. I think invoking a primitive
>>>> has its
>>>> > >cost in newer VMs.
>>>> > >
>>>> > >Levente
>>>> > >
>>>> > >On Thu, 14 Apr 2016, David T. Lewis wrote:
>>>> > >
>>>> > >>Oops, I just realized that my "interpreter VM" results were from a
>>>> > >>debugging
>>>> > >>VM with compiler optimization off (too many VMs, sorry). Here are
>>>> the
>>>> > >>results
>>>> > >>for a more representative interpreter VM. I can't explain the
>>>> variation,
>>>> > >>but
>>>> > >>one conclusion is clear: my suggested "optimization" in the inbox
>>>> is a
>>>> > >>bad idea
>>>> > >>(and I will move it to the treated inbox).
>>>> > >>
>>>> > >> "Base performance in the trunk level V3 image with interpreter VM"
>>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==>
>>>> 26669"
>>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]]. "==>
>>>> 25052"
>>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==>
>>>> 25275"
>>>> > >>
>>>> > >> "Performance after adding LargePositiveInteger>>="
>>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = b ]]. "==>
>>>> 59224"
>>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = c ]].  "==>
>>>> 27824"
>>>> > >> Time millisecondsToRun: [100000000 timesRepeat: [ a = d ]]. "==>
>>>> 44324"
>>>> > >>
>>>> > >>Dave
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >>On Wed, Apr 13, 2016 at 08:25:33PM -0400, David T. Lewis wrote:
>>>> > >>>Hi Levente,
>>>> > >>>
>>>> > >>>I think you may be right. I repeated my test with an interpreter
>>>> VM on a
>>>> > >>>32-bit image (with loop count smaller because the interpreter VM is
>>>> > >>>slower
>>>> > >>>than Spur). The change that I put in the inbox does not have any
>>>> benefit
>>>> > >>>for the interpreter VM:
>>>> > >>>
>>>> > >>>
>>>> > >>>  a := 7432154326465436. "a big number"
>>>> > >>>  b := a + 1. "low order digit changed"
>>>> > >>>  c := 8432154326465436. "high order digit changed"
>>>> > >>>  d := 7432154026465436. "a digit in the middle changed"
>>>> > >>>
>>>> > >>>   "Base performance in the trunk level V3 image with interpreter
>>>> VM"
>>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==>
>>>> 3844"
>>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==>
>>>> 3786"
>>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]].. "==>
>>>> 3800"
>>>> > >>>
>>>> > >>>  "Performance after adding LargePositiveInteger>>="
>>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = b ]]. "==>
>>>> 3868"
>>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = c ]]. "==>
>>>> 3775"
>>>> > >>>  Time millisecondsToRun: [5000000 timesRepeat: [ a = d ]]. "==>
>>>> 3770"
>>>> > >>>
>>>> > >>>So yes it is something related to the VM. But I do not understand
>>>> > >>>how #primDigitCompare could be so slow? As you say, maybe something
>>>> > >>>is wrong with it.
>>>> > >>>
>>>> > >>>Thank you, I am glad that I asked for a review :-)
>>>> > >>>
>>>> > >>>Dave
>>>> > >>>
>>>> > >>>
>>>> > >>>On Thu, Apr 14, 2016 at 01:45:42AM +0200, Levente Uzonyi wrote:
>>>> > >>>>Hi Dave,
>>>> > >>>>
>>>> > >>>>I guess this is a VM related issue. On 64-bit Spur
>>>> > >>>>[ 7432154326465436 digitCompare: 8432154326465436 ] bench.
>>>> > >>>>returns '5,420,000 per second. 184 nanoseconds per run.'.
>>>> > >>>>
>>>> > >>>>Removing the primitive call from #digitCompare: the number goes
>>>> up:
>>>> > >>>>'20,200,000 per second. 49.4 nanoseconds per run.'.
>>>> > >>>>
>>>> > >>>>You might think that it's okay because the JIT is so good. But we
>>>> have
>>>> > >>>>another primitive to compare two bytes-objects. One which ought be
>>>> > >>>>slower than #primDigitCompare:, because it maps the bytes before
>>>> doing
>>>> > >>>>the
>>>> > >>>>comparison. I even subtracted two from its result to match the
>>>> result of
>>>> > >>>>#digitCompare:
>>>> > >>>>
>>>> > >>>>| order |
>>>> > >>>>order := (0 to: 255) as: ByteArray.
>>>> > >>>>[ (ByteString compare: 7432154326465436 with: 8432154326465436
>>>> collated:
>>>> > >>>>order) - 2 ] bench.
>>>> > >>>>
>>>> > >>>>But it's still about twice as quick as #primDigitCompare:.
>>>> > >>>>'9,590,000 per second. 104 nanoseconds per run.'.
>>>> > >>>>So, something must be wrong with #primDigitCompare:.
>>>> > >>>>
>>>> > >>>>Levente
>>>> > >>>>
>>>> > >>>>On Wed, 13 Apr 2016, David T. Lewis wrote:
>>>> > >>>>
>>>> > >>>>>I would appreciate a review before moving this to trunk.
>>>> > >>>>>
>>>> > >>>>>Background:
>>>> > >>>>>
>>>> > >>>>>In UTCDateAndTime, most things are much faster than the trunk
>>>> version.
>>>> > >>>>>However, DataAndTime equality check did not improve for 32-bit
>>>> images,
>>>> > >>>>>so I ran it under AndreasSystemProfiler. Profiling showed that
>>>> large
>>>> > >>>>>integer equality checks spends time mostly in primDigitCompare,
>>>> which
>>>> > >>>>>is inefficient when only a simple byte comparison is needed.
>>>> > >>>>>
>>>> > >>>>>Here is the performance difference that I see on my system,
>>>> 32-bit
>>>> > >>>>>trunk
>>>> > >>>>>Spur on Linux:
>>>> > >>>>>
>>>> > >>>>>| a b c d |
>>>> > >>>>>a := 7432154326465436. "a big number"
>>>> > >>>>>b := a + 1. "low order digit changed"
>>>> > >>>>>c := 8432154326465436. "high order digit changed"
>>>> > >>>>>d := 7432154026465436. "a digit in the middle changed"
>>>> > >>>>>
>>>> > >>>>>"Base performance in the trunk image"
>>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==>
>>>> 63733"
>>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==>
>>>> 63152"
>>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==>
>>>> 63581"
>>>> > >>>>>
>>>> > >>>>>"Performance after adding LargePositiveInteger>>="
>>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = b ]]. "==>
>>>> 4676"
>>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = c ]]. "==>
>>>> 4883"
>>>> > >>>>>Time millisecondsToRun: [500000000 timesRepeat: [ a = d ]]. "==>
>>>> 4512"
>>>> > >>>>>
>>>> > >>>>>Dave
>>>> > >>>>>
>>>> > >>>>>
>>>> > >>>>>
>>>> > >>>>>On Wed, Apr 13, 2016 at 10:57:28PM +0000,
>>>> commits at source.squeak.org
>>>> > >>>>>wrote:
>>>> > >>>>>>David T. Lewis uploaded a new version of Kernel to project The
>>>> Inbox:
>>>> > >>>>>>http://source.squeak.org/inbox/Kernel-dtl.1015.mcz
>>>> > >>>>>>
>>>> > >>>>>>==================== Summary ====================
>>>> > >>>>>>
>>>> > >>>>>>Name: Kernel-dtl.1015
>>>> > >>>>>>Author: dtl
>>>> > >>>>>>Time: 13 April 2016, 6:57:22.56608 pm
>>>> > >>>>>>UUID: bd849f91-9b00-45c5-b2ab-891b420bde5e
>>>> > >>>>>>Ancestors: Kernel-mt.1014
>>>> > >>>>>>
>>>> > >>>>>>Make large integer equality test be about 13 times faster.
>>>> Implement
>>>> > >>>>>>#=
>>>> > >>>>>>in LargePositiveInteger, and use digitAt: (primitive 60) for the
>>>> > >>>>>>comparison.
>>>> > >>>>>>
>>>> > >>>>>>=============== Diff against Kernel-mt.1014 ===============
>>>> > >>>>>>
>>>> > >>>>>>Item was added:
>>>> > >>>>>>+ ----- Method: LargePositiveInteger>>= (in category
>>>> 'comparing')
>>>> > >>>>>>-----
>>>> > >>>>>>+ = aNumber
>>>> > >>>>>>+
>>>> > >>>>>>+       aNumber class == self class ifTrue: [
>>>> > >>>>>>+               aNumber size = self size ifFalse: [ ^false ].
>>>> > >>>>>>+               self size to: 1 by: -1 do: [ :i | (aNumber
>>>> digitAt:
>>>> > >>>>>>i) =
>>>> > >>>>>>(self digitAt: i) ifFalse: [ ^ false ] ].
>>>> > >>>>>>+               ^ true ].
>>>> > >>>>>>+       aNumber isInteger ifTrue: [ ^false ].
>>>> > >>>>>>+       aNumber isNumber ifFalse: [ ^false ].
>>>> > >>>>>>+       ^aNumber adaptToInteger: self andCompare: #=!
>>>> > >>>>>>
>>>> > >>>>>
>>>> > >>>>>
>>>> > >>
>>>> > >>
>>>> > >
>>>> > >
>>>>
>>>
>>>
>>
>
>


-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160416/cd34eac8/attachment-0001.htm


More information about the Vm-dev mailing list