[squeak-dev] ByteArray accessors for 64-bit manipulation

Levente Uzonyi leves at elte.hu
Wed Sep 9 21:11:18 UTC 2015


After thinking a bit more about this, I came to the conclusion that while 
I paid attention to handle reader methods in 64-bit images correctly, 
some of the writer methods are incorrect for "large" SmallInteger inputs.
First I implemented the missing parts for 61-bit SmallIntegers using a 
general purpose loop, but it turned out that the loop is significantly 
faster for smaller numbers, and only slightly slower for larger ones, so I 
decided to nuke the special case for 30-bit SmallIntegers.
I also found further possiblities to optimize writes, so the average is 
down from 259 to 236 milliseconds. Chris Muller's benchmark gives

{
'cbc smallN write'->'12,200,000 per second. 81.7 nanoseconds per run.' .
'cbc smallN access'->'10,300,000 per second. 96.8 nanoseconds per run.' .
'cbc largeN write'->'12,400,000 per second. 80.4 nanoseconds per run.' .
'cbc largeN access'->'5,270,000 per second. 190 nanoseconds per run.'}

Which means +20% speed for smallN writes, and +18% for largeN access.

Levente

P.S.: I still couldn't test the code in my 64-bit Spur image, because it 
won't respond to any input after startup.

On Tue, 8 Sep 2015, Levente Uzonyi wrote:

> Hi Chris,
>
> I added the #normalize sends to avoid the creation of spurious LargeIntegers 
> in 64-bit images (there are two places where I relied on #-'s ability to work 
> on unnormalized input). I didn't have a chance to test it, but I expect it to 
> work correctly. Even if the code is sub-optimal in 64-bit, it shouldn't be 
> any slower than in 32-bit.
>
> Levente
>
> On Tue, 8 Sep 2015, Chris Cunningham wrote:
>
>> Levente,
>> Interesting.  I have a question and a concern about your implementation, 
>> though:
>> 
>> Question: why in the micro checks, is the Write faster than the Access:
>> {'cbc smallN write'->'10,400,000 per second. 95.8 nanoseconds per run.'.
>> 'cbc smallN access'->'10,300,000 per second. 97.4 nanoseconds per run.'.
>> 'cbc largeN write'->'12,400,000 per second. 80.4 nanoseconds per run.'.
>> 'cbc largeN access'->'3,920,000 per second. 255 nanoseconds per run.'}.
>> yet in your more thorough benchmark, the Write twice as slow as the Access?
>> "unsigned long 64"
>> (put, or Write) "average asFloat 536.109375".
>> (Access) "average asFloat 259.359375"
>> any ideas?
>> 
>> the concern is that your code is nicely optimized for our current 32bit vm 
>> - but once we go to 64 bit, I think it will fail.  Should we be
>> concerned?
>> 
>> -cbc
>> 
>> On Tue, Sep 8, 2015 at 2:42 AM, Levente Uzonyi <leves at elte.hu> wrote:
>>       Hi All,
>>
>>       A bit later than I wanted to, but I've finally uploaded my versions 
>> to the Trunk. I guess I went as far as possible with getting the
>>       "fastest implementation".
>>       I modified your benchmark to use the same numbers, so that the 
>> measurements could be repeated. I got the following:
>>
>>       Before:
>>       {'cbc smallN write'->'3,710,000 per second. 269 nanoseconds per 
>> run.'.
>>       'cbc smallN access'->'12,000,000 per second. 83.4 nanoseconds per 
>> run.'.
>>       'cbc largeN write'->'5,430,000 per second. 184 nanoseconds per run.'.
>>       'cbc largeN access'->'1,370,000 per second. 732 nanoseconds per 
>> run.'}.
>>
>>       After:
>>       {'cbc smallN write'->'10,400,000 per second. 95.8 nanoseconds per 
>> run.'.
>>       'cbc smallN access'->'10,300,000 per second. 97.4 nanoseconds per 
>> run.'.
>>       'cbc largeN write'->'12,400,000 per second. 80.4 nanoseconds per 
>> run.'.
>>       'cbc largeN access'->'3,920,000 per second. 255 nanoseconds per 
>> run.'}.
>>
>>       As you can see, everything became faster except for smallN access. 
>> This is the side-effect of optimizing for the average case instead
>>       of specific cases - like zero bytes. I decided not to use that trick, 
>> because it decreased the overall performance.
>>
>>       I also wrote a benchmark which measures reads and writes together. It 
>> generates random numbers which can be represented using a given
>>       number of bits. The result is an array of run times where values 
>> having and odd index belong to big-endian access, and even ones to
>>       little-endian.
>>
>>       | byteArray inputs random storageBits unsigned |
>>       Smalltalk garbageCollect.
>>       random := Random seed: 36rSqueak.
>>       storageBits := 64.
>>       unsigned := true.
>>       byteArray := ByteArray new: storageBits // 8 * 2.
>>       inputs := Array new: 100000.
>>       (2 to: storageBits * 2 + 1) collect: [ :descriptor |
>>               "lowest bit describes endianness, the rest the number of 
>> bits."
>>               | limit bigEndian offset |
>>               bigEndian := descriptor odd.
>>               limit := 1 << (descriptor >> 1) - 1.
>>               unsigned
>>                       ifTrue: [ offset := -1 ]
>>                       ifFalse: [ offset := -1- (limit >> 1) ].
>>               inputs replace: [ :each | (random nextInt: limit) + offset ].
>>               [ 1 to: byteArray size - (storageBits // 8 - 1) do: [ 
>> :startIndex |
>>                       1 to: inputs size do: [ :inputIndex |
>>                               byteArray
>>                                       unsignedLong64At: startIndex
>>                                               put: (inputs at: inputIndex)
>>                                               bigEndian: bigEndian;
>>                                       unsignedLong64At: startIndex
>>                                               bigEndian: bigEndian ] ] ] 
>> timeToRun ].
>>
>>       I ran it with various accessors and got the following results:
>>
>>       "short"
>>       #(28 28 26 26 26 28 26 28 26 28 28 28 26 28 28 28 28 28 28 30 28 28 
>> 28 28 28 28 28 28 26 28 28 28) "average asFloat 27.625".
>>       #(16 18 18 20 18 20 20 20 18 20 18 18 20 20 20 20 20 20 20 20 18 20 
>> 20 20 20 20 20 22 20 20 20 20) "average asFloat 19.5".
>>
>>       "long"
>>       #(62 62 66 68 68 70 68 70 68 70 68 70 68 70 68 70 68 70 70 74 70 72 
>> 70 72 72 74 72 72 70 74 70 72 70 72 72 76 72 76 72 76 72 76 72 74
>>       72 76 70 76 72 76 70 76 72 76 72 74 72 76 72 74 72 76 570 584) 
>> "average asFloat 87.28125".
>>       #(66 66 70 70 72 72 72 72 72 72 74 72 72 74 72 72 74 72 74 72 72 72 
>> 72 72 74 72 74 72 72 72 72 74 72 74 72 72 72 72 72 74 74 72 72 74
>>       74 72 72 72 72 72 72 72 72 72 72 72 72 72 72 72 74 72 116 122) 
>> "average asFloat 73.625".
>>
>>       "unsigned short"
>>       #(18 18 18 20 16 18 18 18 18 18 18 18 18 20 18 20 18 18 18 18 18 20 
>> 20 20 20 20 18 20 18 18 18 18) "average asFloat 18.5".
>>       #(18 18 18 20 20 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 
>> 18 18 18 18 18 18 18 18 18 18) "average asFloat 18.125".
>>
>>       "unsigned long"
>>       #(46 48 48 50 50 50 48 48 50 48 48 48 46 48 46 48 52 54 52 52 52 54 
>> 52 54 52 52 54 54 52 54 52 54 58 58 58 58 58 58 58 58 58 58 56 58
>>       60 58 56 56 60 62 60 62 62 62 60 62 60 62 62 62 384 400 520 694) 
>> "average asFloat 82.40625".
>>        #(62 62 62 64 64 62 62 62 62 64 64 64 64 64 64 64 62 62 64 62 64 62 
>> 64 64 64 64 64 64 64 64 64 64 64 64 62 62 64 64 64 64 62 64 64 64
>>       64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 62 100 108 106 298) 
>> "average asFloat 69.09375".
>>
>>       "unsigned long 64"
>>       #(300 300 300 300 300 300 300 300 300 300 300 300 300 298 302 300 312 
>> 306 308 310 308 306 308 308 310 308 308 308 310 308 312 308 318
>>       316 314 318 316 316 318 316 318 316 316 316 318 318 316 316 326 324 
>> 326 322 326 322 328 324 326 322 326 322 510 520 592 592 634 618
>>       636 640 652 666 642 644 660 648 642 660 652 646 662 658 636 648 626 
>> 632 650 628 632 612 632 620 622 636 626 626 644 632 750 748 812
>>       822 828 858 842 862 898 880 896 840 870 896 926 870 1034 846 880 834 
>> 876 824 860 818 848 824 826 864 820 848 820 828) "average asFloat
>>       536.109375".
>>       #(166 174 168 174 170 176 168 172 166 172 164 170 166 170 166 172 166 
>> 170 166 172 166 172 166 170 166 170 164 170 170 170 168 176 164
>>       170 166 172 166 172 164 174 166 170 168 172 166 172 166 172 166 170 
>> 164 170 166 172 164 172 166 172 166 170 238 272 264 484 282 344
>>       284 356 292 362 294 364 288 362 292 366 294 368 290 364 294 374 294 
>> 374 296 370 294 374 288 370 290 366 290 368 292 364 302 382 304
>>       388 302 390 298 392 298 384 302 388 302 390 298 386 308 398 304 400 
>> 504 402 298 402 298 398 302 398 294 400 298 396). "average asFloat
>>       259.359375"
>> 
>>
>>       Levente
>>
>>       On Sun, 30 Aug 2015, Chris Muller wrote:
>>
>>             Hi Chris, I think these methods belong in the image with the 
>> fastest
>>             implementation we can do.
>>
>>             I implemented 64-bit unsigned access for Ma Serializer back in 
>> 2005.
>>             I modeled my implementation after Andreas' original approach 
>> which
>>             tries to avoid LI arithmetic.  I was curious whether your
>>             implementations would be faster, because if they are then it 
>> could
>>             benefit Magma.  After loading "Ma Serializer" 1.5 (or head) 
>> into a
>>             trunk image, I used the following script to take comparison
>>             measurements:
>>
>>             | smallN largeN maBa cbBa |  smallN := ((2 raisedTo: 13) to: (2
>>             raisedTo: 14)) atRandom.
>>             largeN := ((2 raisedTo: 63) to: (2 raisedTo: 64)) atRandom.
>>             maBa := ByteArray new: 8.
>>             cbBa := ByteArray new: 8.
>>             maBa maUint: 64 at: 0 put: largeN.
>>             cbBa unsignedLong64At: 1 put: largeN bigEndian: false.
>>             self assert: (cbBa maUnsigned64At: 1) = (maBa unsignedLong64At: 
>> 1
>>             bigEndian: false).
>>             { 'cbc smallN write' -> [ cbBa unsignedLong64At: 1 put: smallN
>>             bigEndian: false] bench.
>>             'ma smallN write' -> [cbBa maUint: 64 at: 0 put: smallN ] 
>> bench.
>>             'cbc smallN access' -> [ cbBa unsignedLong64At: 1 bigEndian: 
>> false. ] bench.
>>             'ma smallN access' -> [ cbBa maUnsigned64At: 1] bench.
>>             'cbc largeN write' -> [ cbBa unsignedLong64At: 1 put: largeN
>>             bigEndian: false] bench.
>>             'ma largeN write' -> [cbBa maUint: 64 at: 0 put: largeN ] 
>> bench.
>>             'cbc largeN access' -> [ cbBa unsignedLong64At: 1 bigEndian: 
>> false ] bench.
>>             'ma largeN access' -> [ cbBa maUnsigned64At: 1] bench.
>>             }
>>
>>             Here are the results:
>>
>>             'cbc smallN write'->'3,110,000 per second.  322 nanoseconds per 
>> run.' .
>>             'ma smallN write'->'4,770,000 per second.  210 nanoseconds per 
>> run.' .
>>             'cbc smallN access'->'4,300,000 per second.  233 nanoseconds 
>> per run.' .
>>             'ma smallN access'->'16,400,000 per second.  60.9 nanoseconds 
>> per run.' .
>>             'cbc largeN write'->'907,000 per second.  1.1 microseconds per 
>> run.' .
>>             'ma largeN write'->'6,620,000 per second.  151 nanoseconds per 
>> run.' .
>>             'cbc largeN access'->'1,900,000 per second.  527 nanoseconds 
>> per run.' .
>>             'ma largeN access'->'1,020,000 per second.  982 nanoseconds per 
>> run.'
>>
>>             It looks like your 64-bit access is 86% faster for accessing 
>> the
>>             high-end of the 64-bit range, but slower in the other 3 
>> metrics.
>>             Noticeably, it was only 14% as fast for writing the high-end of 
>> the
>>             64-bit range, and similarly as much slower for small-number 
>> access..
>> 
>>
>>             On Fri, Aug 28, 2015 at 6:01 PM, Chris Cunningham
>>             <cunningham.cb at gmail.com> wrote:
>>                   Hi.
>>
>>                   I've committed a change to the inbox with changes to 
>> allow getting/putting
>>                   64bit values to ByteArrays (similar to 32 and 16 bit 
>> accessors).  Could this
>>                   be added to trunk?
>>
>>                   Also, first time I used the selective commit function - 
>> very nice!  the
>>                   changes I didn't want committed didn't, in fact, get 
>> commited.  Just the
>>                   desirable bits!
>>
>>                   -cbc
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>


More information about the Squeak-dev mailing list