[Vm-dev] Re: [squeak-dev] ByteArray accessors for 64-bit manipulation

Mon Aug 31 22:59:03 UTC 2015

Ok.  After tweaking the unsignedLong64At:bigEndian: to avoid as many
largeIntegers shifts as I can, I get:

smallN := ((2 raisedTo: 13) to: (2 raisedTo: 14)) atRandom.
largeN := ((2 raisedTo: 63) to: (2 raisedTo: 64)) atRandom.
cbBa := ByteArray new: 8.
cbBa unsignedLong64At: 1 put: largeN bigEndian: false.
self assert: (cbBa maUnsigned64At: 1) = (cbBa unsignedLong64At: 1
bigEndian: false).
{
'smallN write' ->  (cbBa unsignedLong64At: 1 put: smallN bigEndian: false).
'cbc smallN access' -> [ cbBa unsignedLong64At: 1 bigEndian: false. ] bench.
'ma smallN access' -> [ cbBa maUnsigned64At: 1] bench.
'smallN write' ->  (cbBa unsignedLong64At: 1 put: largeN bigEndian: false).
'cbc largeN access' -> [ cbBa unsignedLong64At: 1 bigEndian: false ] bench.
'ma largeN access' -> [ cbBa maUnsigned64At: 1] bench.
}

 {
'smallN write'->15464 .
'cbc smallN access'->'18,500,000 per second. 54.1 nanoseconds per run.' .
'ma smallN access'->'18,800,000 per second. 53.2 nanoseconds per run.' .
'smallN write'->17835413562943208876 .
'cbc largeN access'->'337,000 per second. 2.97 microseconds per run.' .
'ma largeN access'->'137,000 per second. 7.29 microseconds per run.'
}

So, 1 nanosecond slower than yours on my machine for small numbers, and
significantly faster than yours on largeNumbers (and faster than what I had
before - taking 3/4 of the time as previous).

This is in inBox as Collections-cbc.651.mcz.

I'll work on the writing speed as well.

-cbc

On Mon, Aug 31, 2015 at 2:49 PM, Chris Cunningham <cunningham.cb at gmail.com>
wrote:

> Hi Chris,
>
> So, I've finally installed the MA Serializer (which I should have done
> first thing), and understand better what is going on.
>
> Your uint:at: is already delegating to various other methods depending on
> the byte size, so introducing unsignedLog63At:bigEndian: would be in line
> with that - as long as we made it as efficient as your version.  Perfect.
> Your uint:at:put: however, writes directly to the ByteArray (and assumes
> littleEndian).  If I/we can make the unsignedLong64At:put:bitEndian: work
> as effeciently as the uint:at:put, would that work for you?
>
> -cbc
>
> On Mon, Aug 31, 2015 at 11:35 AM, Chris Muller <asqueaker at gmail.com>
> wrote:
>
>>
>> Sometimes the number of bytes is only known in a variable, so would it
>> be possible to do 4 primitives which accept the number of bits (or
>> bytes) as an argument?  (uint:at: uint:at:put:) * (big endian, little
>> endian)
>>
>>
>> On Mon, Aug 31, 2015 at 12:25 PM, Eliot Miranda <eliot.miranda at gmail.com>
>> wrote:
>> > Hi Chrises,
>> >
>> >     my vote would be to write these as 12 numbered primitives, (2,4 & 8
>> > bytes) * (at: & at:put:) * (big & little endian) because they can be
>> > performance critical and implementing them like this means the maximum
>> > efficiency in both 32-bit and 64-bit Spur, plus the possibility of the
>> JIT
>> > implementing the primitives.
>> >
>> > On Sun, Aug 30, 2015 at 10:01 PM, Chris Cunningham <
>> cunningham.cb at gmail.com>
>> > wrote:
>> >>
>> >> Hi Chris,
>> >>
>> >> I'm all for having the fastest that in the image that works.  If you
>> could
>> >> make your version handle endianess, then I'm all for including it (at
>> least
>> >> in the 3 variants that are faster).  My first use for this (interface
>> for
>> >> KAFKA) apparently requires bigEndianess, so I really want that
>> supported.
>> >>
>> >> It might be best to keep my naming, though - it follows the name
>> pattern
>> >> that is already in the class.  Or will yours also support 128?
>> >>
>> >> -cbc
>> >>
>> >> On Sun, Aug 30, 2015 at 2:38 PM, Chris Muller <asqueaker at gmail.com>
>> wrote:
>> >>>
>> >>> Hi Chris, I think these methods belong in the image with the fastest
>> >>> implementation we can do.
>> >>>
>> >>> I implemented 64-bit unsigned access for Ma Serializer back in 2005.
>> >>> I modeled my implementation after Andreas' original approach which
>> >>> tries to avoid LI arithmetic.  I was curious whether your
>> >>> implementations would be faster, because if they are then it could
>> >>> benefit Magma.  After loading "Ma Serializer" 1.5 (or head) into a
>> >>> trunk image, I used the following script to take comparison
>> >>> measurements:
>> >>>
>> >>> | smallN largeN maBa cbBa |  smallN := ((2 raisedTo: 13) to: (2
>> >>> raisedTo: 14)) atRandom.
>> >>> largeN := ((2 raisedTo: 63) to: (2 raisedTo: 64)) atRandom.
>> >>> maBa := ByteArray new: 8.
>> >>> cbBa := ByteArray new: 8.
>> >>> maBa maUint: 64 at: 0 put: largeN.
>> >>> cbBa unsignedLong64At: 1 put: largeN bigEndian: false.
>> >>> self assert: (cbBa maUnsigned64At: 1) = (maBa unsignedLong64At: 1
>> >>> bigEndian: false).
>> >>> { 'cbc smallN write' -> [ cbBa unsignedLong64At: 1 put: smallN
>> >>> bigEndian: false] bench.
>> >>> 'ma smallN write' -> [cbBa maUint: 64 at: 0 put: smallN ] bench.
>> >>> 'cbc smallN access' -> [ cbBa unsignedLong64At: 1 bigEndian: false. ]
>> >>> bench.
>> >>> 'ma smallN access' -> [ cbBa maUnsigned64At: 1] bench.
>> >>> 'cbc largeN write' -> [ cbBa unsignedLong64At: 1 put: largeN
>> >>> bigEndian: false] bench.
>> >>> 'ma largeN write' -> [cbBa maUint: 64 at: 0 put: largeN ] bench.
>> >>> 'cbc largeN access' -> [ cbBa unsignedLong64At: 1 bigEndian: false ]
>> >>> bench.
>> >>> 'ma largeN access' -> [ cbBa maUnsigned64At: 1] bench.
>> >>>  }
>> >>>
>> >>> Here are the results:
>> >>>
>> >>> 'cbc smallN write'->'3,110,000 per second.  322 nanoseconds per run.'
>> .
>> >>> 'ma smallN write'->'4,770,000 per second.  210 nanoseconds per run.' .
>> >>> 'cbc smallN access'->'4,300,000 per second.  233 nanoseconds per
>> run.' .
>> >>> 'ma smallN access'->'16,400,000 per second.  60.9 nanoseconds per
>> run.' .
>> >>> 'cbc largeN write'->'907,000 per second.  1.1 microseconds per run.' .
>> >>> 'ma largeN write'->'6,620,000 per second.  151 nanoseconds per run.' .
>> >>> 'cbc largeN access'->'1,900,000 per second.  527 nanoseconds per
>> run.' .
>> >>> 'ma largeN access'->'1,020,000 per second.  982 nanoseconds per run.'
>> >>>
>> >>> It looks like your 64-bit access is 86% faster for accessing the
>> >>> high-end of the 64-bit range, but slower in the other 3 metrics.
>> >>> Noticeably, it was only 14% as fast for writing the high-end of the
>> >>> 64-bit range, and similarly as much slower for small-number access..
>> >>>
>> >>>
>> >>> On Fri, Aug 28, 2015 at 6:01 PM, Chris Cunningham
>> >>> <cunningham.cb at gmail.com> wrote:
>> >>> > Hi.
>> >>> >
>> >>> > I've committed a change to the inbox with changes to allow
>> >>> > getting/putting
>> >>> > 64bit values to ByteArrays (similar to 32 and 16 bit accessors).
>> Could
>> >>> > this
>> >>> > be added to trunk?
>> >>> >
>> >>> > Also, first time I used the selective commit function - very nice!
>> the
>> >>> > changes I didn't want committed didn't, in fact, get commited.  Just
>> >>> > the
>> >>> > desirable bits!
>> >>> >
>> >>> > -cbc
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > _,,,^..^,,,_
>> > best, Eliot
>> >
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20150831/68e94db2/attachment-0001.htm