Float bug toolkit: what the hash is this?

Thu Feb 19 16:52:02 UTC 1998

David N. Smith writes:
 > 
 > When floats are constrained to answer the same hash value as integers, one
 > has to round it to the nearest integer. This obviously throws away all the
 > bits in the fraction. Yet, fully half of all floating point values are less
 > than 1.0 but greater than zero, ignoring sign, which means that fully half
 > of floats answer a zero hash. This is not a good thing.
 > 
 > 
 > I'm not proposing that mixed mode arithmetic be changed, just that hash
 > values be good hash values. I wonder how many people have given up on
 > Smalltalk when they tried to scale up from a test case to a production
 > system and found that performance went to hell. They may not know that it
 > is bad hashing that did it, but we know and we should make sure that hash
 > values are good as hash values.
 > 

It's not that bad--one could "just" check whether the float is an an
integer (or LargeInteger!), and if so make sure we get the same hash
as for the integer.  Likewise there needs to be a check for Fractions.

But here is an alternate sheme.  (Which seems to be what all the smart
people are getting at here :))

Why do we need to consider Float's as being exact, anway?  If we
think of Float's as representing a range of values, which is how you
should usually be thinking when you compute with them anyway, then you
get easy answers to all these problems.  <>= still make sense between
two floats.  A Float should not be = to any Integer or Fraction, just
like a Dictionary is ~= to any integer.

Some people will get burnt when 1 ~= 1.0.  But then maybe they will
remember what floating point numbers and "equality" really are.  (Or
maybe they'll say "Smalltalk is stupid" and leave :))

Unfortunately, this viewpoint also makes comparison with < and > of
floats kinda strange.  It's possible for more than one Double to fit
in the range covered by a single Float.  How do you compare these
doubles to the float?  They should really not be =, <, or >.

None of these concerns will mess up people employing good programming
practice.  If they want strict equalities, they should almost
certainly not be using floats.  If they want fuzzy comparisons, they
should just be using < and >.  And if they have mixed Doubles and
Floats, uh, hmm, well, they have to be careful.

It would be nice to have a version <>= that conceptually convert to
Double and use the midpoint of the represented region, they way the
standard <>= mostly (but not always) work now.  Just like String has =
and sameAs:.

This scheme seams to work, though it means the standard =<> would be
different from what =<> normally in mixed-format numeric computations.
Thoughts?

Lex