[squeak-dev] Integer arithmetic and bit operations in Squeak and Pharo (32bit & 64bit)

Sat Jan 7 07:59:56 UTC 2017

Hello guys,

A few questions/comments/remarks about integer arithmetic and bit operations.  And I guess many of my interrogations most likely concern the guys building the VM/primitives...

I'm working on a chess engine that will rely heavily on bitboard manipulations, i.e. lots of integer arithmetic with bit operations.  I was comparing the 32bit & 64bit Spur VM for Squeak & Pharo (I want the chess engine to be portable on Squeak & Pharo).  As I pointed out earlier on on the Squeak-dev  mailing list in october 2016, there are a few things that look strange to me and/or that I do not understand.

1) #bitXor: performs the same on the 32bit & 64bit VM for LargePositiveInteger.  BUT, on average, #bitXor: is 15 (fifteen) times *slower* on LargePositiveInteger than with SmallInteger, for both 32 & 64 bit VM/Image. Besides, with SmallInteger, #bitXor: is 4.5 faster on 32bit as compared to 64bit!!  

You can test (on 32bit & 64bit) that with the following snippet:

| small1 small2 small3 large1 large2 n timeSmall timeLarge |
small1 := 2r1111111111111111. "65535,  (1<<16)-1"
small2 := (1 << 30) - 1. "Largest SmallInteger on 32bit VM/Image, SmallInteger maxVal"
small3 :=  1152921504606846975. "Largest Integer on 64bit VM/Image, SmallInteger maxVal"
large1 := (1 << 64)-1. "64 bits set to 1" 
large2 := (1 << 80)-1. "80 bits set to 1"

n := 100000000.
timeSmall := Time millisecondsToRun: [n timesRepeat: [ small1 bitXor:  small2]].
timeLarge := Time millisecondsToRun: [n timesRepeat: [ large1 bitXor:  large2]].
Transcript cr;show: 'Time LargePositiveInteger : ', timeLarge printString.
Transcript cr;show: 'Time SmallInteger : ', timeSmall printString.

As was pointed out by Nicolas Cellier in a private communication, one workaround is to add the method #bitXor: in the class LargePositiveInteger to get a 3x performance boost.  Nicolas recommended the following code:

LargePositiveInteger>>bitXor: arg
    <primitive:36>
    ^super bitXor: arg

It's all good and nice (and I do get a 3x speedup, in 32&64 bit) but then I wonder in which respect primitive 36 is different from the one LargePositiveInteger usually inherits from Integer :

(Integer>>bitXor:)
<primitive: 'primDigitBitXor' module:'LargeIntegers'>

Why does Nicolas' workaround is able to do the job (calling primitive 36) when there is already a primitive for the exact thing (<primitive: 'primDigitBitXor' module:'LargeIntegers'>) ? Is this duplicate code or there is a reason for this?  And why is the primDigitBitXor primitive so slow as compared to primitive 36? 

2) One would normally expect the 64bit VM to be faster than the 32bit VM.  That's true in all cases except...  Well, first, some numbers (Test case attached to this email if you want to reproduce)

32bit
Number of #allMask: per second : 6.20M
Number of #anyMask: per second : 7.17M
Number of #bitAnd: per second : 8.45M
Number of #bitAt: per second : 55.15M
Number of #bitAt:put: per second : 37.22M
Number of #bitClear: per second : 5.18M
Number of #bitInvert per second : 6.53M
Number of #bitOr: per second : 9.18M
Number of #bitXor: per second : 8.97M
Number of #highBit per second : 43.23M
Number of #<< per second : 11.34M
Number of #lowBit per second : 69.44M
Number of #noMask: per second : 7.40M
Number of #>> per second : 12.36M

64bit
Number of #allMask: per second : 10.26M
Number of #anyMask: per second : 10.37M
Number of #bitAnd: per second : 17.00M
Number of #bitAt: per second : 15.89M
Number of #bitAt:put: per second : 10.36M
Number of #bitClear: per second : 6.44M
Number of #bitInvert per second : 9.11M
Number of #bitOr: per second : 18.45M
Number of #bitXor: per second : 15.38M
Number of #highBit per second : 7.66M
Number of #<< per second : 10.50M
Number of #lowBit per second : 13.49M
Number of #noMask: per second : 11.47M
Number of #>> per second : 10.52M

There are a few surprises here.  First, #bitAt:, #bitAt:put:, #highBit and #lowBit are a *lot faster* on the 32bit VM than on the 64bit VM! Does anyone have an explanation for this?  Is this normal?

The other surprise is that #>> and #<< seem to be about the same speed on both VM.  Any reason why the 64bit version isn't faster than the 32bit one?

3) I was surprised to find out that SmallInteger>>#maxVal wasn't a constant (or the same on both VM).  It's (1<<60)-1 on the 64bit VM and (1<<30)-1 on the 32bit VM.  So, be warned if your code depends on SmallInteger>>#maxVal for some reason!

4) Is there any rationale/reason/explanation as to why, in Pharo, LargePositiveInteger inherits from the class LargeInteger (which doesn't exist in Squeak) ?

Thanks in advance.

 ----------------- 
Benoît St-Jean 
Yahoo! Messenger: bstjean 
Twitter: @BenLeChialeux 
Pinterest: benoitstjean 
Instagram: Chef_Benito
IRC: lamneth 
Blogue: endormitoire.wordpress.com 
"A standpoint is an intellectual horizon of radius zero".  (A. Einstein)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20170107/f55f13f9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TestBitOperations.st
Type: application/octet-stream
Size: 16480 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20170107/f55f13f9/attachment.obj>