Re: [squeak-dev] ByteArray accessors for 64-bit manipulation

1 Sep 2015

      FWIW... IMO it's better to enable access to the relevant compiler 
intrinsic with platform specific macros, rather than implementing 
instructions such as Intel's BSWAP or MOVBE by hand.  In HPS, isolating 
endianness concerns from the large integer arithmetic primitives with 
such macros enabled 25-40% faster performance on big endian platforms. 
Just as importantly, the intrinsic approach takes significantly less 
code to implement.
On 8/31/15 10:25 , Eliot Miranda wrote:
...
Hi Chrises,
 my vote would be to write these as 12 numbered primitives, (2,4 & 8

bytes) * (at: & at:put:) * (big & little endian) because they can be
performance critical and implementing them like this means the maximum
efficiency in both 32-bit and 64-bit Spur, plus the possibility of the
JIT implementing the primitives.
On Sun, Aug 30, 2015 at 10:01 PM, Chris Cunningham
<cunningham.cb@gmail.com mailto:cunningham.cb@gmail.com> wrote:
Hi Chris,

I'm all for having the fastest that in the image that works.  If you
could make your version handle endianess, then I'm all for including
it (at least in the 3 variants that are faster).  My first use for
this (interface for KAFKA) apparently requires bigEndianess, so I
really want that supported.

It might be best to keep my naming, though - it follows the name
pattern that is already in the class.  Or will yours also support 128?

-cbc

On Sun, Aug 30, 2015 at 2:38 PM, Chris Muller <asqueaker@gmail.com
<mailto:asqueaker@gmail.com>> wrote:

    Hi Chris, I think these methods belong in the image with the fastest
    implementation we can do.

    I implemented 64-bit unsigned access for Ma Serializer back in 2005.
    I modeled my implementation after Andreas' original approach which
    tries to avoid LI arithmetic.  I was curious whether your
    implementations would be faster, because if they are then it could
    benefit Magma.  After loading "Ma Serializer" 1.5 (or head) into a
    trunk image, I used the following script to take comparison
    measurements:

    | smallN largeN maBa cbBa |  smallN := ((2 raisedTo: 13) to: (2
    raisedTo: 14)) atRandom.
    largeN := ((2 raisedTo: 63) to: (2 raisedTo: 64)) atRandom.
    maBa := ByteArray new: 8.
    cbBa := ByteArray new: 8.
    maBa maUint: 64 at: 0 put: largeN.
    cbBa unsignedLong64At: 1 put: largeN bigEndian: false.
    self assert: (cbBa maUnsigned64At: 1) = (maBa unsignedLong64At: 1
    bigEndian: false).
    { 'cbc smallN write' -> [ cbBa unsignedLong64At: 1 put: smallN
    bigEndian: false] bench.
    'ma smallN write' -> [cbBa maUint: 64 at: 0 put: smallN ] bench.
    'cbc smallN access' -> [ cbBa unsignedLong64At: 1 bigEndian:
    false. ] bench.
    'ma smallN access' -> [ cbBa maUnsigned64At: 1] bench.
    'cbc largeN write' -> [ cbBa unsignedLong64At: 1 put: largeN
    bigEndian: false] bench.
    'ma largeN write' -> [cbBa maUint: 64 at: 0 put: largeN ] bench.
    'cbc largeN access' -> [ cbBa unsignedLong64At: 1 bigEndian:
    false ] bench.
    'ma largeN access' -> [ cbBa maUnsigned64At: 1] bench.
      }

    Here are the results:

    'cbc smallN write'->'3,110,000 per second.  322 nanoseconds per
    run.' .
    'ma smallN write'->'4,770,000 per second.  210 nanoseconds per
    run.' .
    'cbc smallN access'->'4,300,000 per second.  233 nanoseconds per
    run.' .
    'ma smallN access'->'16,400,000 per second.  60.9 nanoseconds
    per run.' .
    'cbc largeN write'->'907,000 per second.  1.1 microseconds per
    run.' .
    'ma largeN write'->'6,620,000 per second.  151 nanoseconds per
    run.' .
    'cbc largeN access'->'1,900,000 per second.  527 nanoseconds per
    run.' .
    'ma largeN access'->'1,020,000 per second.  982 nanoseconds per
    run.'

    It looks like your 64-bit access is 86% faster for accessing the
    high-end of the 64-bit range, but slower in the other 3 metrics.
    Noticeably, it was only 14% as fast for writing the high-end of the
    64-bit range, and similarly as much slower for small-number access..

    On Fri, Aug 28, 2015 at 6:01 PM, Chris Cunningham
    <cunningham.cb@gmail.com <mailto:cunningham.cb@gmail.com>> wrote:
     > Hi.
     >
     > I've committed a change to the inbox with changes to allow
    getting/putting
     > 64bit values to ByteArrays (similar to 32 and 16 bit
    accessors).  Could this
     > be added to trunk?
     >
     > Also, first time I used the selective commit function - very
    nice!  the
     > changes I didn't want committed didn't, in fact, get
    commited.  Just the
     > desirable bits!
     >
     > -cbc
     >
     >
     >

--
_,,,^..^,,,_
best, Eliot