Float bug toolkit: what the hash is this?

David N. Smith dnsmith at watson.ibm.com
Thu Feb 19 15:59:54 UTC 1998


At 0:38 -0500 2/19/98, Eliot & Linda wrote:
>sqrmax at cvtci.com.ar wrote:
>[snip]
>> Moreover, even if they do have the same hash (due to design,
>> coincidence or whatever), I think that they should not answer true to an
>>equality
>> cross-test, even if they have the 'same values'.
>
>Why?  Why break something as simple and powerful as Smalltalk's
>mixed-mode arithmetic because certain number representations are
>approximations that must be used with care in certain contexts?
>
>What do the language designers have to say about the Number system?
>Dan?
>
>What do the numerical analysts have to say about floating-point and
>numbers?  Dave?
>
>What do the usability/comprehensibility experts have to say? Alan?
>
>I absolutely concede that if one is using floating-point to compute
>results that are required to exhibit a given accuracy then one will have
>to be very aware of the representations' limitations.  But I don't see
>that this means I have to quarantine more quotidian use.  It appears to
>me that IEEE strives  to be usable.  Rounding-modes seem to be good
>enough to save embarrasment in many cases.  For example, I tried the
>following in VisualWorks:
>
>	0.01 * 100.0 - 1.0
>
>and I got 0.0.  Not epsilon, but 0.0.  This is presumably because some
>effort has gone into the design of rounding modes.  Hence I rarely hang
>myself with IEEE.
> ...SNIP...


I'm not a numerical analyst, just a one-time heavy user of floats who'd
like to see Smalltalk get floats right.


*** SOAPBOX WARNING ON ***

Your example was lucky! I first suspected a bit of rounding when it was
printed, but here is what I got when I tried it with Squeak.

   (0.01 * 100.0 - 1.0) hex
    '0000000000000000'

   (0.07 * 100.0 - 7.0) hex
    '3CD0000000000000'

(The first 12-bits are the exponent and sign; the exponent is the right
11-bits and is biased by 16r3FF. The fraction is all zeros, but there is an
invisible 1-bit at the left; thus the fraction is really 53 bits, not the
52 that show.)


Note that the example fails miserably for the last case, leaving a 1 bit
near the right edge.

An aside: Uh, you don't see a one bit near the right edge? It's that
invisible leading 1 bit shifted to the right. The example below multiplies
the result from above by 2^50; -50 is the exponent, in bits, of that
invisible leading zero but we have to reverse the sign for #bitShift:.

   (0.07 * 100.0 - 7.0  * (1 bitShift: 16r3FF - 16r3CD))
      asInteger printStringBase: 16
    '16r1'


In VW 2.5, I got the same resuilts (after adding a hex method and forcing
the results to double):

   (0.01d0 * 100.0 - 1.0) hex
    '0000000000000000'

   (0.07d0 * 100.0 - 7.0) hex
    '3CD0000000000000'


One cannot compare arbitrary floats (which is what you are really doing);
it doesn't work in general. Since it doesn't work in general, one shouldn't
pretend that it does.

The main use (only use?) of hash for numbers is to produce a value which is
a good first probe into a hashed collection; a good first probe is one that
probably doesn't collide with other values (provided that the collection is
not nearly 'full').

When floats are constrained to answer the same hash value as integers, one
has to round it to the nearest integer. This obviously throws away all the
bits in the fraction. Yet, fully half of all floating point values are less
than 1.0 but greater than zero, ignoring sign, which means that fully half
of floats answer a zero hash. This is not a good thing.


I'm not proposing that mixed mode arithmetic be changed, just that hash
values be good hash values. I wonder how many people have given up on
Smalltalk when they tried to scale up from a test case to a production
system and found that performance went to hell. They may not know that it
is bad hashing that did it, but we know and we should make sure that hash
values are good as hash values.

I was on this soapbox here last week and had more examples of bad hashing.
I'll spare you a repeat. :-)

*** SOAPBOX WARNING OFF ***

Dave

_______________________________
David N. Smith
IBM T J Watson Research Center
Hawthorne, NY
_______________________________
Any opinions or recommendations
herein are those of the author
and not of his employer.





More information about the Squeak-dev mailing list