[squeak-dev] Integer arithmetic and bit operations in Squeak and Pharo (32bit & 64bit)

Eliot Miranda eliot.miranda at gmail.com
Mon Jan 9 02:24:32 UTC 2017


...and by the way, I got 3.4e-9 on 32-bit (down from 2.7e-9) when I
launched it after the 64-bit system and ran the benchmark while the 64-bit
system was still running.  So run on as quiet a machine as possible.

On Sun, Jan 8, 2017 at 4:07 PM, Eliot Miranda <eliot.miranda at gmail.com>
wrote:

> Hi Benoit,
>
> On Fri, Jan 6, 2017 at 11:59 PM, Benoit St-Jean <bstjean at yahoo.com> wrote:
>
>> Hello guys,
>>
>> A few questions/comments/remarks about integer arithmetic and bit
>> operations.  And I guess many of my interrogations most likely concern the
>> guys building the VM/primitives...
>>
>> I'm working on a chess engine that will rely heavily on bitboard
>> manipulations, i.e. lots of integer arithmetic with bit operations.  I was
>> comparing the 32bit & 64bit Spur VM for Squeak & Pharo (I want the chess
>> engine to be portable on Squeak & Pharo).  As I pointed out earlier on on
>> the Squeak-dev  mailing list in october 2016, there are a few things that
>> look strange to me and/or that I do not understand.
>>
>> 1) #bitXor: performs the same on the 32bit & 64bit VM for
>> LargePositiveInteger.  BUT, on average, #bitXor: is 15 (fifteen) times
>> *slower* on LargePositiveInteger than with SmallInteger, for both 32 & 64
>> bit VM/Image. Besides, with SmallInteger, #bitXor: is 4.5 faster on 32bit
>> as compared to 64bit!!
>>
>> You can test (on 32bit & 64bit) that with the following snippet:
>>
>> | small1 small2 small3 large1 large2 n timeSmall timeLarge |
>> small1 := 2r1111111111111111. "65535,  (1<<16)-1"
>> small2 := (1 << 30) - 1. "Largest SmallInteger on 32bit VM/Image,
>> SmallInteger maxVal"
>> small3 :=  1152921504606846975. "Largest Integer on 64bit VM/Image,
>> SmallInteger maxVal"
>> large1 := (1 << 64)-1. "64 bits set to 1"
>> large2 := (1 << 80)-1. "80 bits set to 1"
>>
>> n := 100000000.
>> timeSmall := Time millisecondsToRun: [n timesRepeat: [ small1 bitXor:
>> small2]].
>> timeLarge := Time millisecondsToRun: [n timesRepeat: [ large1 bitXor:
>> large2]].
>> Transcript cr;show: 'Time LargePositiveInteger : ', timeLarge printString.
>> Transcript cr;show: 'Time SmallInteger : ', timeSmall printString.
>>
>
> It's really important when running micro-benchmarks like this to minimize
> or eliminate other costs.  So you should unwind your inner loops, to at
> least to ten repetitions or more.  And it's wise to measure the time taken
> for an empty loop.  In these cases I'd consider comparing a loop containing
> 10 instances of the operation against one containing one, e.g. for a time
> in seconds you could use
>
>
> | small1 small2 n |
> n := 1000000.
> small1 := SmallInteger minVal.
> small2 := SmallInteger maxVal.
> (Time millisecondsToRun: [1 to: n do: [:i| small1 bitXor:  small2. small1
> bitXor:  small2. small1 bitXor:  small2. small1 bitXor:  small2. small1
> bitXor:  small2. small1 bitXor:  small2. small1 bitXor:  small2. small1
> bitXor:  small2. small1 bitXor:  small2. small1 bitXor:  small2. small1
> bitXor:  small2]]) - (Time millisecondsToRun: [1 to: n do: [:i| small1
> bitXor:  small2]]) / 10000.0 / n
>
> When I do this I get 2.4e-9 for 64-bit Squeak and 2.7e-9 for 32-bit
> Squeak.  Note that I used 1 to: n do: [:ignored| rather than timesRepeat:
> because I know its inlined and timesRepeat: isn't always.  But in any case
> I don't see the large difference in SmallInteger operation times that you
> do.  Could you repeat your measurements taking something similar to my
> approach?
>
>
>>
>> As was pointed out by Nicolas Cellier in a private communication, one
>> workaround is to add the method #bitXor: in the class LargePositiveInteger
>> to get a 3x performance boost.  Nicolas recommended the following code:
>>
>> LargePositiveInteger>>bitXor: arg
>>     <primitive:36>
>>     ^super bitXor: arg
>>
>> It's all good and nice (and I do get a 3x speedup, in 32&64 bit) but then
>> I wonder in which respect primitive 36 is different from the one
>> LargePositiveInteger usually inherits from Integer :
>>
>> (Integer>>bitXor:)
>> <primitive: 'primDigitBitXor' module:'LargeIntegers'>
>>
>> Why does Nicolas' workaround is able to do the job (calling primitive 36)
>> when there is already a primitive for the exact thing (<primitive:
>> 'primDigitBitXor' module:'LargeIntegers'>) ? Is this duplicate code or
>> there is a reason for this?  And why is the primDigitBitXor primitive so
>> slow as compared to primitive 36?
>>
>
> Primitive 36 deals with only 64-bit values (up to 8 byte LargeIntegers).   <primitive:
> 'primDigitBitXor' module:'LargeIntegers'> deals with arbitrary sized large
> integers.
>
>>
>> 2) One would normally expect the 64bit VM to be faster than the 32bit
>> VM.  That's true in all cases except...  Well, first, some numbers (Test
>> case attached to this email if you want to reproduce)
>>
>
> It depends on the computation.  One should expect the 64-bit VM to be
> faster for certain kinds of arithmetic because it can do more in one
> operation, e.g. 60-bit arithmetic should be much faster in 64-bits than in
> 32-bits, most floating point should be much faster because of the immediate
> floating-point representation.  But consider a purely symbolic computation,
> say tree walking.  The 64-bit version has to move twice as much data as the
> 32-bit version, so if the working set is large but amenable to execution in
> 32-bits one would expect the 32-bit application to run faster because it
> accesses half the data.
>
>
>
>>
>> 32bit
>> Number of #allMask: per second : 6.20M
>> Number of #anyMask: per second : 7.17M
>> Number of #bitAnd: per second : 8.45M
>> Number of #bitAt: per second : 55.15M
>> Number of #bitAt:put: per second : 37.22M
>> Number of #bitClear: per second : 5.18M
>> Number of #bitInvert per second : 6.53M
>> Number of #bitOr: per second : 9.18M
>> Number of #bitXor: per second : 8.97M
>> Number of #highBit per second : 43.23M
>> Number of #<< per second : 11.34M
>> Number of #lowBit per second : 69.44M
>> Number of #noMask: per second : 7.40M
>> Number of #>> per second : 12.36M
>>
>> 64bit
>> Number of #allMask: per second : 10.26M
>> Number of #anyMask: per second : 10.37M
>> Number of #bitAnd: per second : 17.00M
>> Number of #bitAt: per second : 15.89M
>> Number of #bitAt:put: per second : 10.36M
>> Number of #bitClear: per second : 6.44M
>> Number of #bitInvert per second : 9.11M
>> Number of #bitOr: per second : 18.45M
>> Number of #bitXor: per second : 15.38M
>> Number of #highBit per second : 7.66M
>> Number of #<< per second : 10.50M
>> Number of #lowBit per second : 13.49M
>> Number of #noMask: per second : 11.47M
>> Number of #>> per second : 10.52M
>>
>> There are a few surprises here.  First, #bitAt:, #bitAt:put:, #highBit
>> and #lowBit are a *lot faster* on the 32bit VM than on the 64bit VM! Does
>> anyone have an explanation for this?  Is this normal?
>>
>
> Can you rerun the numbers using my approach before we attempt to answer
> this?
>
>
>>
>> The other surprise is that #>> and #<< seem to be about the same speed on
>> both VM.  Any reason why the 64bit version isn't faster than the 32bit one?
>>
>> 3) I was surprised to find out that SmallInteger>>#maxVal wasn't a
>> constant (or the same on both VM).  It's (1<<60)-1 on the 64bit VM and
>> (1<<30)-1 on the 32bit VM.  So, be warned if your code depends on
>> SmallInteger>>#maxVal for some reason!
>>
>> 4) Is there any rationale/reason/explanation as to why, in Pharo,
>> LargePositiveInteger inherits from the class LargeInteger (which doesn't
>> exist in Squeak) ?
>>
>> Thanks in advance.
>>
>>
>> -----------------
>> BenoƮt St-Jean
>> Yahoo! Messenger: bstjean
>> Twitter: @BenLeChialeux
>> Pinterest: benoitstjean
>> Instagram: Chef_Benito
>> IRC: lamneth
>> Blogue: endormitoire.wordpress.com
>> "A standpoint is an intellectual horizon of radius zero".  (A. Einstein)
>>
>>
>>
>>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>



-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20170108/5dc2b0b7/attachment.html>


More information about the Squeak-dev mailing list