Hi Chris,

On Tue, Dec 23, 2014 at 12:50 PM, Chris Muller <asqueaker@gmail.com> wrote:
On Mon, Dec 22, 2014 at 3:59 AM, Bert Freudenberg <bert@freudenbergs.de> wrote:
> On 22.12.2014, at 00:13, Levente Uzonyi <leves@elte.hu> wrote:
>>
>> ConverterFloatArray at: 1 put: self; basicAt: 1.
>
> Any reason not to use this in #asIEEE32BitWord? Endianness? Arch-dependency?
>
> I see, it's not thread-safe. This would be:
>
>         (FloatArray new: 1) at: 1 put: self; basicAt: 1.
>
> Might still be faster?

Yes.  Since creation of a one-element FloatArray every time did not
adversely affect performance of Levente's too significantly (only 3.7X
instead of 4.0X faster), I decided it was worth the cost of the
allocation than to worry about concurrency.  So I ended up with
Levente's latest except I cannot risk a calculation ending up -0.0, so
I have to account for it too.  And, NaN too.  Thus:

     hashKey32
          | bits |
          self = NegativeInfinity ifTrue: [ ^ 0 ].
          self = Infinity ifTrue: [ ^ 4294967294 ].
          self = NaN ifTrue: [ ^ 4294967295 ].
          self = NegativeZero ifTrue: [ ^ 2147483651 ].
          bits := (FloatArray new: 1) at: 1 put: self; basicAt: 1.
          self < 0.0 ifTrue: [ ^ 4286578688 - bits ].
          ^ 2147483651 + bits

FloatArray basicNew: 1 will be a little bit faster.  Please use hex to make the layout clear.

 
Since there are not a full 32-bits worth of IEEE 32-bit floats (e.g.,
several thousand convert to NaN), it might be wise to move +Infinity
and NaN _down_ a bit from the very maximum, for better continuity
between the float and integer number lines, or for potential future
special-case needs..?

In any case, I wanted to at least see if what we have, above, works
for every 32-bit IEEE float.  To verify that, I enumerated all Floats
in numerical order from -Infinity to +Infinity by creating them via
#fromIEEE32BitFloat: from the appropriate ranges.

It hit a snag at 2151677948.  Check this out:

     | this next |
     this := Float fromIEEE32Bit: 2151677949.
     next := Float fromIEEE32Bit: 2151677948.
     self
          assert: next > this ;
          assert: ((FloatArray new: 1) at: 1 put: (next); basicAt: 1)
> ((FloatArray new: 1) at: 1 put: (this); basicAt: 1)

As I thought, the representations between IEEE floats and FloatArray
floats are different-enough that their precisions align differently
onto the 32-bit map for these two floats.  IEEE's are precise-enough
to distinguish these two floats, FloatArray representations are not.

Chris, FloatArray stores 32-bit ieee 754 single-precision floats, Float represents 64-bit ieee 754 double-precision floats.  They look like this:

single-precision: sign, 8-bit exponent, 23 bit mantissa
double-precision: sign, 11-bit exponent, 52 bit mantissa

So if you assign a large Float to a Float array it will map to Infinity:

((FloatArray new: 1) at: 1 put: 1.0e238; at: 1) => Infinity

and if you assign a small one it will map to zero:

((FloatArray new: 1) at: 1 put: 1.0e-238; at: 1) => 0.0


That these guys are considered "equal" by the FloatArray is actually
good enough for my indexing requirement, but now I'm looking at the
prim-fail code for FloatArray:

    at: index
         <primitive: 'primitiveAt' module: 'FloatArrayPlugin'>
          ^Float fromIEEE32Bit: (self basicAt: index)

If this or the #at:put: primitive were to ever fail on the storage
(at:put:) exclusive-or the access (at:) side, then it appears
FloatArray itself would retrieve a value different than was stored..!

But that happens whenever you store a double that cannot be represented as a 32-bit float.  That;s exactly what we're doing here is mapping 64-bit floats to 32-bit floats so we expect to retrieve different values than those stored most of the time, on average 2^32-1/(2^32).  Only 1/(2^32) of the double precision floats are exactly representable in 32-bits. 



--
best,
Eliot