[Vm-dev] float word order

Sun Apr 19 01:15:46 UTC 2009

Hi All,
    I see that Float 32-bit word order is big-endian (PowerPC) on all
platforms.  This is a pain for performance and a pain for code generation in
Cog.  For example using SSE2 instructions it is trivial to swizzle a
PowerPC-layout Float into an xmm register using the PSHUFD SSE2 instruction
but tediously verbose to swizzle on write, because one has to swizzle to an
xmm register which is hence destructive, which means three instructions
(shuffle, write, unshuffle) just to write a Float result.  Yes, ok 2 extra
instructions is small potatoes, but they're still starch.  So I wonder what
would the impact be of maintaining Floats in platform order?  There are a
number of possible solutions.

1. Floats are always in platform order and swizzled on image load when
moving from little-endian to big-endian or vice verce.  Image code must be
rewritten to take the platform's endianness into account. (requires an image
rewrite)

2.  As for 1 but the image is isolated from the change by providing two
primitives, primitiveFloatAt and primitiveFloatAtPut which are implemented
with selectors at: basicAt: at:put: and basicAt:put: on Float.  These
primitives map index 1 onto the most significant word and index 2 onto the
least significant word.  (requires no image rewrite, but does require a
file-in of the four implementations)

3. as for 1 but the image is isolated from the change by providing four
primitives primitiveFloatLowWord, primitiveFloatLowWordPut
primitiveFloatHighWord & primitiveFloatHighWordPut (requires as much of a
rewrite of image code as 1)

4. as per 1 but provide two primitives primitiveFloatBits
prmitiveFloatBitsPut which answer or store 64-bit non-negative
integers. (requires as much of a rewrite of image code as 1 but is cleaner
and scales to 128 bit floats)

5. modify the existing at:[put:] primitives to check for Float receivers,
e.g. (and in our Qwaq images Float has a compact class index of 6)
from commonVariable:at:cacheIndex:
fmt < 8 ifTrue:  "Bitmap (& Float!!)"
[(self compactClassIndexOf: oop) == ClassFloatCompactClassIndex
                            ifTrue: [result := self fetchLong32: 2 - index
ofObject: rcvr]
                            ifFalse: [result := self fetchLong32: index - 1
ofObject: rcvr].
 ^self positive32BitIntegerFor: result].
This slows down at: access for Bitmap and complicates an already
overcomplicated, and performance-critical, primitive

6. eat it.  do the swizzling on every float access

6. is apparently painless but actually absurd because we're unnecessarily
throwing away performance for no good reason.

5. ditto, not for Float but for Bitmap access (and Bitmap is used in the vm
simulator ;) )

2. is my recommendation because it has least effort for adopters of
solutions that provide maximum performance

Opinions & alternatives?  Especially, what are the likely issues of moving
to platform Float order?

Best
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20090418/3cf19d70/attachment.htm