[Vm-dev] Re: [squeak-dev] ByteArray accessors for 64-bit manipulation

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Sep 1 07:22:22 UTC 2015


Just a side note: there are not only big-endian platforms, but big-endian
protocols also for data exchange...

2015-09-01 4:12 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:

>
> Hi Andres,
>
> > On Aug 31, 2015, at 5:52 PM, Andres Valloud <
> avalloud at smalltalk.comcastbiz.net> wrote:
> >
> > FWIW... IMO it's better to enable access to the relevant compiler
> intrinsic with platform specific macros, rather than implementing
> instructions such as Intel's BSWAP or MOVBE by hand.  In HPS, isolating
> endianness concerns from the large integer arithmetic primitives with such
> macros enabled 25-40% faster performance on big endian platforms. Just as
> importantly, the intrinsic approach takes significantly less code to
> implement.
>
> Makes sense, and the performance increases are impressive.  The only issue
> I have is that the Cog JIT (which would have the easiest time generating
> those intrinsics) currently runs only in little-endianness platforms and I
> seriously doubt it will run in a big endianness platform in the next five
> years.  PowerPC is the only possibility I see.  Yes, ARM is biendian but
> all the popular applications I know of are little endian.
>
> VW's a different beast; significant big endian legacy.
>
> But what you say about isolating makes perfect sense.  Thanks
>
> >
> >> On 8/31/15 10:25 , Eliot Miranda wrote:
> >> Hi Chrises,
> >>
> >>     my vote would be to write these as 12 numbered primitives, (2,4 & 8
> >> bytes) * (at: & at:put:) * (big & little endian) because they can be
> >> performance critical and implementing them like this means the maximum
> >> efficiency in both 32-bit and 64-bit Spur, plus the possibility of the
> >> JIT implementing the primitives.
> >>
> >> On Sun, Aug 30, 2015 at 10:01 PM, Chris Cunningham
> >> <cunningham.cb at gmail.com <mailto:cunningham.cb at gmail.com>> wrote:
> >>
> >>    Hi Chris,
> >>
> >>    I'm all for having the fastest that in the image that works.  If you
> >>    could make your version handle endianess, then I'm all for including
> >>    it (at least in the 3 variants that are faster).  My first use for
> >>    this (interface for KAFKA) apparently requires bigEndianess, so I
> >>    really want that supported.
> >>
> >>    It might be best to keep my naming, though - it follows the name
> >>    pattern that is already in the class.  Or will yours also support
> 128?
> >>
> >>    -cbc
> >>
> >>    On Sun, Aug 30, 2015 at 2:38 PM, Chris Muller <asqueaker at gmail.com
> >>    <mailto:asqueaker at gmail.com>> wrote:
> >>
> >>        Hi Chris, I think these methods belong in the image with the
> fastest
> >>        implementation we can do.
> >>
> >>        I implemented 64-bit unsigned access for Ma Serializer back in
> 2005.
> >>        I modeled my implementation after Andreas' original approach
> which
> >>        tries to avoid LI arithmetic.  I was curious whether your
> >>        implementations would be faster, because if they are then it
> could
> >>        benefit Magma.  After loading "Ma Serializer" 1.5 (or head) into
> a
> >>        trunk image, I used the following script to take comparison
> >>        measurements:
> >>
> >>        | smallN largeN maBa cbBa |  smallN := ((2 raisedTo: 13) to: (2
> >>        raisedTo: 14)) atRandom.
> >>        largeN := ((2 raisedTo: 63) to: (2 raisedTo: 64)) atRandom.
> >>        maBa := ByteArray new: 8.
> >>        cbBa := ByteArray new: 8.
> >>        maBa maUint: 64 at: 0 put: largeN.
> >>        cbBa unsignedLong64At: 1 put: largeN bigEndian: false.
> >>        self assert: (cbBa maUnsigned64At: 1) = (maBa unsignedLong64At: 1
> >>        bigEndian: false).
> >>        { 'cbc smallN write' -> [ cbBa unsignedLong64At: 1 put: smallN
> >>        bigEndian: false] bench.
> >>        'ma smallN write' -> [cbBa maUint: 64 at: 0 put: smallN ] bench.
> >>        'cbc smallN access' -> [ cbBa unsignedLong64At: 1 bigEndian:
> >>        false. ] bench.
> >>        'ma smallN access' -> [ cbBa maUnsigned64At: 1] bench.
> >>        'cbc largeN write' -> [ cbBa unsignedLong64At: 1 put: largeN
> >>        bigEndian: false] bench.
> >>        'ma largeN write' -> [cbBa maUint: 64 at: 0 put: largeN ] bench.
> >>        'cbc largeN access' -> [ cbBa unsignedLong64At: 1 bigEndian:
> >>        false ] bench.
> >>        'ma largeN access' -> [ cbBa maUnsigned64At: 1] bench.
> >>          }
> >>
> >>        Here are the results:
> >>
> >>        'cbc smallN write'->'3,110,000 per second.  322 nanoseconds per
> >>        run.' .
> >>        'ma smallN write'->'4,770,000 per second.  210 nanoseconds per
> >>        run.' .
> >>        'cbc smallN access'->'4,300,000 per second.  233 nanoseconds per
> >>        run.' .
> >>        'ma smallN access'->'16,400,000 per second.  60.9 nanoseconds
> >>        per run.' .
> >>        'cbc largeN write'->'907,000 per second.  1.1 microseconds per
> >>        run.' .
> >>        'ma largeN write'->'6,620,000 per second.  151 nanoseconds per
> >>        run.' .
> >>        'cbc largeN access'->'1,900,000 per second.  527 nanoseconds per
> >>        run.' .
> >>        'ma largeN access'->'1,020,000 per second.  982 nanoseconds per
> >>        run.'
> >>
> >>        It looks like your 64-bit access is 86% faster for accessing the
> >>        high-end of the 64-bit range, but slower in the other 3 metrics.
> >>        Noticeably, it was only 14% as fast for writing the high-end of
> the
> >>        64-bit range, and similarly as much slower for small-number
> access..
> >>
> >>
> >>        On Fri, Aug 28, 2015 at 6:01 PM, Chris Cunningham
> >>        <cunningham.cb at gmail.com <mailto:cunningham.cb at gmail.com>>
> wrote:
> >>         > Hi.
> >>         >
> >>         > I've committed a change to the inbox with changes to allow
> >>        getting/putting
> >>         > 64bit values to ByteArrays (similar to 32 and 16 bit
> >>        accessors).  Could this
> >>         > be added to trunk?
> >>         >
> >>         > Also, first time I used the selective commit function - very
> >>        nice!  the
> >>         > changes I didn't want committed didn't, in fact, get
> >>        commited.  Just the
> >>         > desirable bits!
> >>         >
> >>         > -cbc
> >>         >
> >>         >
> >>         >
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> _,,,^..^,,,_
> >> best, Eliot
> >>
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20150901/f9a8ec30/attachment-0001.htm


More information about the Vm-dev mailing list