[squeak-dev] ByteArray accessors for 64-bit manipulation

Chris Muller asqueaker at gmail.com
Tue Sep 8 19:30:13 UTC 2015


(Ha!  What an amazing thread!)  Levente, why am I not surprised that
you could manage to squeeze yet even more efficiency out of it.  If
there were an eXtreme sports for programming, you'd win the gold
medal...   :)

I've got to try this on the MagmaBenchmarker right now, I'll let you
know in a bit..

On Tue, Sep 8, 2015 at 4:42 AM, Levente Uzonyi <leves at elte.hu> wrote:
> Hi All,
>
> A bit later than I wanted to, but I've finally uploaded my versions to the
> Trunk. I guess I went as far as possible with getting the "fastest
> implementation".
> I modified your benchmark to use the same numbers, so that the measurements
> could be repeated. I got the following:
>
> Before:
> {'cbc smallN write'->'3,710,000 per second. 269 nanoseconds per run.'.
> 'cbc smallN access'->'12,000,000 per second. 83.4 nanoseconds per run.'.
> 'cbc largeN write'->'5,430,000 per second. 184 nanoseconds per run.'.
> 'cbc largeN access'->'1,370,000 per second. 732 nanoseconds per run.'}.
>
> After:
> {'cbc smallN write'->'10,400,000 per second. 95.8 nanoseconds per run.'.
> 'cbc smallN access'->'10,300,000 per second. 97.4 nanoseconds per run.'.
> 'cbc largeN write'->'12,400,000 per second. 80.4 nanoseconds per run.'.
> 'cbc largeN access'->'3,920,000 per second. 255 nanoseconds per run.'}.
>
> As you can see, everything became faster except for smallN access. This is
> the side-effect of optimizing for the average case instead of specific cases
> - like zero bytes. I decided not to use that trick, because it decreased the
> overall performance.
>
> I also wrote a benchmark which measures reads and writes together. It
> generates random numbers which can be represented using a given number of
> bits. The result is an array of run times where values having and odd index
> belong to big-endian access, and even ones to little-endian.
>
> | byteArray inputs random storageBits unsigned |
> Smalltalk garbageCollect.
> random := Random seed: 36rSqueak.
> storageBits := 64.
> unsigned := true.
> byteArray := ByteArray new: storageBits // 8 * 2.
> inputs := Array new: 100000.
> (2 to: storageBits * 2 + 1) collect: [ :descriptor |
>         "lowest bit describes endianness, the rest the number of bits."
>         | limit bigEndian offset |
>         bigEndian := descriptor odd.
>         limit := 1 << (descriptor >> 1) - 1.
>         unsigned
>                 ifTrue: [ offset := -1 ]
>                 ifFalse: [ offset := -1- (limit >> 1) ].
>         inputs replace: [ :each | (random nextInt: limit) + offset ].
>         [ 1 to: byteArray size - (storageBits // 8 - 1) do: [ :startIndex |
>                 1 to: inputs size do: [ :inputIndex |
>                         byteArray
>                                 unsignedLong64At: startIndex
>                                         put: (inputs at: inputIndex)
>                                         bigEndian: bigEndian;
>                                 unsignedLong64At: startIndex
>                                         bigEndian: bigEndian ] ] ] timeToRun
> ].
>
> I ran it with various accessors and got the following results:
>
> "short"
> #(28 28 26 26 26 28 26 28 26 28 28 28 26 28 28 28 28 28 28 30 28 28 28 28 28
> 28 28 28 26 28 28 28) "average asFloat 27.625".
> #(16 18 18 20 18 20 20 20 18 20 18 18 20 20 20 20 20 20 20 20 18 20 20 20 20
> 20 20 22 20 20 20 20) "average asFloat 19.5".
>
> "long"
> #(62 62 66 68 68 70 68 70 68 70 68 70 68 70 68 70 68 70 70 74 70 72 70 72 72
> 74 72 72 70 74 70 72 70 72 72 76 72 76 72 76 72 76 72 74 72 76 70 76 72 76
> 70 76 72 76 72 74 72 76 72 74 72 76 570 584) "average asFloat 87.28125".
> #(66 66 70 70 72 72 72 72 72 72 74 72 72 74 72 72 74 72 74 72 72 72 72 72 74
> 72 74 72 72 72 72 74 72 74 72 72 72 72 72 74 74 72 72 74 74 72 72 72 72 72
> 72 72 72 72 72 72 72 72 72 72 74 72 116 122) "average asFloat 73.625".
>
> "unsigned short"
> #(18 18 18 20 16 18 18 18 18 18 18 18 18 20 18 20 18 18 18 18 18 20 20 20 20
> 20 18 20 18 18 18 18) "average asFloat 18.5".
> #(18 18 18 20 20 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
> 18 18 18 18 18 18 18) "average asFloat 18.125".
>
> "unsigned long"
> #(46 48 48 50 50 50 48 48 50 48 48 48 46 48 46 48 52 54 52 52 52 54 52 54 52
> 52 54 54 52 54 52 54 58 58 58 58 58 58 58 58 58 58 56 58 60 58 56 56 60 62
> 60 62 62 62 60 62 60 62 62 62 384 400 520 694) "average asFloat 82.40625".
>  #(62 62 62 64 64 62 62 62 62 64 64 64 64 64 64 64 62 62 64 62 64 62 64 64
> 64 64 64 64 64 64 64 64 64 64 62 62 64 64 64 64 62 64 64 64 64 64 64 64 64
> 64 64 64 64 64 64 64 64 64 64 62 100 108 106 298) "average asFloat
> 69.09375".
>
> "unsigned long 64"
> #(300 300 300 300 300 300 300 300 300 300 300 300 300 298 302 300 312 306
> 308 310 308 306 308 308 310 308 308 308 310 308 312 308 318 316 314 318 316
> 316 318 316 318 316 316 316 318 318 316 316 326 324 326 322 326 322 328 324
> 326 322 326 322 510 520 592 592 634 618 636 640 652 666 642 644 660 648 642
> 660 652 646 662 658 636 648 626 632 650 628 632 612 632 620 622 636 626 626
> 644 632 750 748 812 822 828 858 842 862 898 880 896 840 870 896 926 870 1034
> 846 880 834 876 824 860 818 848 824 826 864 820 848 820 828) "average
> asFloat 536.109375".
> #(166 174 168 174 170 176 168 172 166 172 164 170 166 170 166 172 166 170
> 166 172 166 172 166 170 166 170 164 170 170 170 168 176 164 170 166 172 166
> 172 164 174 166 170 168 172 166 172 166 172 166 170 164 170 166 172 164 172
> 166 172 166 170 238 272 264 484 282 344 284 356 292 362 294 364 288 362 292
> 366 294 368 290 364 294 374 294 374 296 370 294 374 288 370 290 366 290 368
> 292 364 302 382 304 388 302 390 298 392 298 384 302 388 302 390 298 386 308
> 398 304 400 504 402 298 402 298 398 302 398 294 400 298 396). "average
> asFloat 259.359375"
>
>
> Levente
>
>
> On Sun, 30 Aug 2015, Chris Muller wrote:
>
>> Hi Chris, I think these methods belong in the image with the fastest
>> implementation we can do.
>>
>> I implemented 64-bit unsigned access for Ma Serializer back in 2005.
>> I modeled my implementation after Andreas' original approach which
>> tries to avoid LI arithmetic.  I was curious whether your
>> implementations would be faster, because if they are then it could
>> benefit Magma.  After loading "Ma Serializer" 1.5 (or head) into a
>> trunk image, I used the following script to take comparison
>> measurements:
>>
>> | smallN largeN maBa cbBa |  smallN := ((2 raisedTo: 13) to: (2
>> raisedTo: 14)) atRandom.
>> largeN := ((2 raisedTo: 63) to: (2 raisedTo: 64)) atRandom.
>> maBa := ByteArray new: 8.
>> cbBa := ByteArray new: 8.
>> maBa maUint: 64 at: 0 put: largeN.
>> cbBa unsignedLong64At: 1 put: largeN bigEndian: false.
>> self assert: (cbBa maUnsigned64At: 1) = (maBa unsignedLong64At: 1
>> bigEndian: false).
>> { 'cbc smallN write' -> [ cbBa unsignedLong64At: 1 put: smallN
>> bigEndian: false] bench.
>> 'ma smallN write' -> [cbBa maUint: 64 at: 0 put: smallN ] bench.
>> 'cbc smallN access' -> [ cbBa unsignedLong64At: 1 bigEndian: false. ]
>> bench.
>> 'ma smallN access' -> [ cbBa maUnsigned64At: 1] bench.
>> 'cbc largeN write' -> [ cbBa unsignedLong64At: 1 put: largeN
>> bigEndian: false] bench.
>> 'ma largeN write' -> [cbBa maUint: 64 at: 0 put: largeN ] bench.
>> 'cbc largeN access' -> [ cbBa unsignedLong64At: 1 bigEndian: false ]
>> bench.
>> 'ma largeN access' -> [ cbBa maUnsigned64At: 1] bench.
>> }
>>
>> Here are the results:
>>
>> 'cbc smallN write'->'3,110,000 per second.  322 nanoseconds per run.' .
>> 'ma smallN write'->'4,770,000 per second.  210 nanoseconds per run.' .
>> 'cbc smallN access'->'4,300,000 per second.  233 nanoseconds per run.' .
>> 'ma smallN access'->'16,400,000 per second.  60.9 nanoseconds per run.' .
>> 'cbc largeN write'->'907,000 per second.  1.1 microseconds per run.' .
>> 'ma largeN write'->'6,620,000 per second.  151 nanoseconds per run.' .
>> 'cbc largeN access'->'1,900,000 per second.  527 nanoseconds per run.' .
>> 'ma largeN access'->'1,020,000 per second.  982 nanoseconds per run.'
>>
>> It looks like your 64-bit access is 86% faster for accessing the
>> high-end of the 64-bit range, but slower in the other 3 metrics.
>> Noticeably, it was only 14% as fast for writing the high-end of the
>> 64-bit range, and similarly as much slower for small-number access..
>>
>>
>> On Fri, Aug 28, 2015 at 6:01 PM, Chris Cunningham
>> <cunningham.cb at gmail.com> wrote:
>>>
>>> Hi.
>>>
>>> I've committed a change to the inbox with changes to allow
>>> getting/putting
>>> 64bit values to ByteArrays (similar to 32 and 16 bit accessors).  Could
>>> this
>>> be added to trunk?
>>>
>>> Also, first time I used the selective commit function - very nice!  the
>>> changes I didn't want committed didn't, in fact, get commited.  Just the
>>> desirable bits!
>>>
>>> -cbc
>>>
>>>
>>>
>>
>>
>


More information about the Squeak-dev mailing list