[squeak-dev] ByteArray accessors for 64-bit manipulation
Chris Cunningham
cunningham.cb at gmail.com
Tue Sep 8 22:11:33 UTC 2015
Ok, it makes sense now. Well, it made sense before, I was just slow to see
it.
-cbc
On Tue, Sep 8, 2015 at 9:25 AM, Levente Uzonyi <leves at elte.hu> wrote:
> I forgot to answer your other questions.
>
> On Tue, 8 Sep 2015, Chris Cunningham wrote:
>
> Levente,
>> Interesting. I have a question and a concern about your implementation,
>> though:
>>
>> Question: why in the micro checks, is the Write faster than the Access:
>>
>
> Because Access means allocation of new objects (LargeIntegers), while
> Write simply means copying bytes.
>
> {'cbc smallN write'->'10,400,000 per second. 95.8 nanoseconds per run.'.
>> 'cbc smallN access'->'10,300,000 per second. 97.4 nanoseconds per run.'.
>> 'cbc largeN write'->'12,400,000 per second. 80.4 nanoseconds per run.'.
>> 'cbc largeN access'->'3,920,000 per second. 255 nanoseconds per run.'}.
>> yet in your more thorough benchmark, the Write twice as slow as the
>> Access?
>>
>
> For each pair of lines the first one was measured _before_ my changes, the
> second one was measured _after_ them. The benchmark measures read and write
> together, so 536.109375 stands for the average number of milliseconds to
> read and write ninemillion numbers using your variant of the 64-bit methods
> on my machine.
>
> "unsigned long 64"
>> (put, or Write) "average asFloat 536.109375".
>> (Access) "average asFloat 259.359375"
>> any ideas?
>>
>
> So this is
>
> (read and write Before) "average asFloat 536.109375"
> (read and write After) "average asFloat 259.359375"
>
>
> Levente
>
>
>
>> the concern is that your code is nicely optimized for our current 32bit
>> vm - but once we go to 64 bit, I think it will fail. Should we be
>> concerned?
>>
>> -cbc
>>
>> On Tue, Sep 8, 2015 at 2:42 AM, Levente Uzonyi <leves at elte.hu> wrote:
>> Hi All,
>>
>> A bit later than I wanted to, but I've finally uploaded my versions
>> to the Trunk. I guess I went as far as possible with getting the
>> "fastest implementation".
>> I modified your benchmark to use the same numbers, so that the
>> measurements could be repeated. I got the following:
>>
>> Before:
>> {'cbc smallN write'->'3,710,000 per second. 269 nanoseconds per
>> run.'.
>> 'cbc smallN access'->'12,000,000 per second. 83.4 nanoseconds per
>> run.'.
>> 'cbc largeN write'->'5,430,000 per second. 184 nanoseconds per
>> run.'.
>> 'cbc largeN access'->'1,370,000 per second. 732 nanoseconds per
>> run.'}.
>>
>> After:
>> {'cbc smallN write'->'10,400,000 per second. 95.8 nanoseconds per
>> run.'.
>> 'cbc smallN access'->'10,300,000 per second. 97.4 nanoseconds per
>> run.'.
>> 'cbc largeN write'->'12,400,000 per second. 80.4 nanoseconds per
>> run.'.
>> 'cbc largeN access'->'3,920,000 per second. 255 nanoseconds per
>> run.'}.
>>
>> As you can see, everything became faster except for smallN access.
>> This is the side-effect of optimizing for the average case instead
>> of specific cases - like zero bytes. I decided not to use that
>> trick, because it decreased the overall performance.
>>
>> I also wrote a benchmark which measures reads and writes together.
>> It generates random numbers which can be represented using a given
>> number of bits. The result is an array of run times where values
>> having and odd index belong to big-endian access, and even ones to
>> little-endian.
>>
>> | byteArray inputs random storageBits unsigned |
>> Smalltalk garbageCollect.
>> random := Random seed: 36rSqueak.
>> storageBits := 64.
>> unsigned := true.
>> byteArray := ByteArray new: storageBits // 8 * 2.
>> inputs := Array new: 100000.
>> (2 to: storageBits * 2 + 1) collect: [ :descriptor |
>> "lowest bit describes endianness, the rest the number of
>> bits."
>> | limit bigEndian offset |
>> bigEndian := descriptor odd.
>> limit := 1 << (descriptor >> 1) - 1.
>> unsigned
>> ifTrue: [ offset := -1 ]
>> ifFalse: [ offset := -1- (limit >> 1) ].
>> inputs replace: [ :each | (random nextInt: limit) + offset
>> ].
>> [ 1 to: byteArray size - (storageBits // 8 - 1) do: [
>> :startIndex |
>> 1 to: inputs size do: [ :inputIndex |
>> byteArray
>> unsignedLong64At: startIndex
>> put: (inputs at: inputIndex)
>> bigEndian: bigEndian;
>> unsignedLong64At: startIndex
>> bigEndian: bigEndian ] ] ]
>> timeToRun ].
>>
>> I ran it with various accessors and got the following results:
>>
>> "short"
>> #(28 28 26 26 26 28 26 28 26 28 28 28 26 28 28 28 28 28 28 30 28 28
>> 28 28 28 28 28 28 26 28 28 28) "average asFloat 27.625".
>> #(16 18 18 20 18 20 20 20 18 20 18 18 20 20 20 20 20 20 20 20 18 20
>> 20 20 20 20 20 22 20 20 20 20) "average asFloat 19.5".
>>
>> "long"
>> #(62 62 66 68 68 70 68 70 68 70 68 70 68 70 68 70 68 70 70 74 70 72
>> 70 72 72 74 72 72 70 74 70 72 70 72 72 76 72 76 72 76 72 76 72 74
>> 72 76 70 76 72 76 70 76 72 76 72 74 72 76 72 74 72 76 570 584)
>> "average asFloat 87.28125".
>> #(66 66 70 70 72 72 72 72 72 72 74 72 72 74 72 72 74 72 74 72 72 72
>> 72 72 74 72 74 72 72 72 72 74 72 74 72 72 72 72 72 74 74 72 72 74
>> 74 72 72 72 72 72 72 72 72 72 72 72 72 72 72 72 74 72 116 122)
>> "average asFloat 73.625".
>>
>> "unsigned short"
>> #(18 18 18 20 16 18 18 18 18 18 18 18 18 20 18 20 18 18 18 18 18 20
>> 20 20 20 20 18 20 18 18 18 18) "average asFloat 18.5".
>> #(18 18 18 20 20 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
>> 18 18 18 18 18 18 18 18 18 18) "average asFloat 18.125".
>>
>> "unsigned long"
>> #(46 48 48 50 50 50 48 48 50 48 48 48 46 48 46 48 52 54 52 52 52 54
>> 52 54 52 52 54 54 52 54 52 54 58 58 58 58 58 58 58 58 58 58 56 58
>> 60 58 56 56 60 62 60 62 62 62 60 62 60 62 62 62 384 400 520 694)
>> "average asFloat 82.40625".
>> #(62 62 62 64 64 62 62 62 62 64 64 64 64 64 64 64 62 62 64 62 64
>> 62 64 64 64 64 64 64 64 64 64 64 64 64 62 62 64 64 64 64 62 64 64 64
>> 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 62 100 108 106 298)
>> "average asFloat 69.09375".
>>
>> "unsigned long 64"
>> #(300 300 300 300 300 300 300 300 300 300 300 300 300 298 302 300
>> 312 306 308 310 308 306 308 308 310 308 308 308 310 308 312 308 318
>> 316 314 318 316 316 318 316 318 316 316 316 318 318 316 316 326 324
>> 326 322 326 322 328 324 326 322 326 322 510 520 592 592 634 618
>> 636 640 652 666 642 644 660 648 642 660 652 646 662 658 636 648 626
>> 632 650 628 632 612 632 620 622 636 626 626 644 632 750 748 812
>> 822 828 858 842 862 898 880 896 840 870 896 926 870 1034 846 880
>> 834 876 824 860 818 848 824 826 864 820 848 820 828) "average asFloat
>> 536.109375".
>> #(166 174 168 174 170 176 168 172 166 172 164 170 166 170 166 172
>> 166 170 166 172 166 172 166 170 166 170 164 170 170 170 168 176 164
>> 170 166 172 166 172 164 174 166 170 168 172 166 172 166 172 166 170
>> 164 170 166 172 164 172 166 172 166 170 238 272 264 484 282 344
>> 284 356 292 362 294 364 288 362 292 366 294 368 290 364 294 374 294
>> 374 296 370 294 374 288 370 290 366 290 368 292 364 302 382 304
>> 388 302 390 298 392 298 384 302 388 302 390 298 386 308 398 304 400
>> 504 402 298 402 298 398 302 398 294 400 298 396). "average asFloat
>> 259.359375"
>>
>>
>> Levente
>>
>> On Sun, 30 Aug 2015, Chris Muller wrote:
>>
>> Hi Chris, I think these methods belong in the image with the
>> fastest
>> implementation we can do.
>>
>> I implemented 64-bit unsigned access for Ma Serializer back
>> in 2005.
>> I modeled my implementation after Andreas' original approach
>> which
>> tries to avoid LI arithmetic. I was curious whether your
>> implementations would be faster, because if they are then it
>> could
>> benefit Magma. After loading "Ma Serializer" 1.5 (or head)
>> into a
>> trunk image, I used the following script to take comparison
>> measurements:
>>
>> | smallN largeN maBa cbBa | smallN := ((2 raisedTo: 13) to:
>> (2
>> raisedTo: 14)) atRandom.
>> largeN := ((2 raisedTo: 63) to: (2 raisedTo: 64)) atRandom.
>> maBa := ByteArray new: 8.
>> cbBa := ByteArray new: 8.
>> maBa maUint: 64 at: 0 put: largeN.
>> cbBa unsignedLong64At: 1 put: largeN bigEndian: false.
>> self assert: (cbBa maUnsigned64At: 1) = (maBa
>> unsignedLong64At: 1
>> bigEndian: false).
>> { 'cbc smallN write' -> [ cbBa unsignedLong64At: 1 put: smallN
>> bigEndian: false] bench.
>> 'ma smallN write' -> [cbBa maUint: 64 at: 0 put: smallN ]
>> bench.
>> 'cbc smallN access' -> [ cbBa unsignedLong64At: 1 bigEndian:
>> false. ] bench.
>> 'ma smallN access' -> [ cbBa maUnsigned64At: 1] bench.
>> 'cbc largeN write' -> [ cbBa unsignedLong64At: 1 put: largeN
>> bigEndian: false] bench.
>> 'ma largeN write' -> [cbBa maUint: 64 at: 0 put: largeN ]
>> bench.
>> 'cbc largeN access' -> [ cbBa unsignedLong64At: 1 bigEndian:
>> false ] bench.
>> 'ma largeN access' -> [ cbBa maUnsigned64At: 1] bench.
>> }
>>
>> Here are the results:
>>
>> 'cbc smallN write'->'3,110,000 per second. 322 nanoseconds
>> per run.' .
>> 'ma smallN write'->'4,770,000 per second. 210 nanoseconds
>> per run.' .
>> 'cbc smallN access'->'4,300,000 per second. 233 nanoseconds
>> per run.' .
>> 'ma smallN access'->'16,400,000 per second. 60.9 nanoseconds
>> per run.' .
>> 'cbc largeN write'->'907,000 per second. 1.1 microseconds
>> per run.' .
>> 'ma largeN write'->'6,620,000 per second. 151 nanoseconds
>> per run.' .
>> 'cbc largeN access'->'1,900,000 per second. 527 nanoseconds
>> per run.' .
>> 'ma largeN access'->'1,020,000 per second. 982 nanoseconds
>> per run.'
>>
>> It looks like your 64-bit access is 86% faster for accessing
>> the
>> high-end of the 64-bit range, but slower in the other 3
>> metrics.
>> Noticeably, it was only 14% as fast for writing the high-end
>> of the
>> 64-bit range, and similarly as much slower for small-number
>> access..
>>
>>
>> On Fri, Aug 28, 2015 at 6:01 PM, Chris Cunningham
>> <cunningham.cb at gmail.com> wrote:
>> Hi.
>>
>> I've committed a change to the inbox with changes to
>> allow getting/putting
>> 64bit values to ByteArrays (similar to 32 and 16 bit
>> accessors). Could this
>> be added to trunk?
>>
>> Also, first time I used the selective commit function -
>> very nice! the
>> changes I didn't want committed didn't, in fact, get
>> commited. Just the
>> desirable bits!
>>
>> -cbc
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20150908/7ed0d1a1/attachment.htm
More information about the Squeak-dev
mailing list
|