ANSI, =, hash, Integer, Float

Richard A. O'Keefe ok at cs.otago.ac.nz
Tue Dec 17 23:54:21 UTC 2002


Bryce Kampjes <bryce at kampjes.demon.co.uk> asks:
	Given the "Floating Point" discussion, is it possible that
	Floats shouldn't be used as keys in collections?
	
A core property of 'reasonable' floating-point systems
is that there is a wide range of numbers for which floating-point
arithmetic is EXACT.  I'm defining 'reasonable' here to mean
'can be described by the Brown model with parameters that are
not so outrageous the designer should have died of shame';
pretty much anything you are likely to meet these days (and I
am including AS/400 and z/Architecture in the list) counts as
reasonable.

In particular, addition, subtraction, multiplication, comparison,
and remainder (fmod(), drem()) of integers are computed EXACTLY,
up to some bound.  For double precision floats (the things that
Squeak Float maps onto) this is usefully larger than the range
of 32-bit integers.

For integers that might be big for SmallInteger but will fit in
53 bits plus sign, Float is a perfectly good representation with
perfectly good semantics.  Remember, arithmetic on integers in
floating point format (up to the bound) is EXACT; there is nothing
stranger or more dubious about 1.0 + 1.0 = 2.0 than there is about
1 + 1 = 2.

For such numbers, arithmetic and comparison in Float representation
is significantly (factor of 2) to dramatically (factor of 5) faster
than arithmetic and comparion in LargeInteger representation.
(Squeak 3.0 on a G3, LargeIntegers v1.2.9 May 2002 (i) plugin).

Thus there are situations where
 - using Floats as keys is semantically well defined
 - using Floats as keys makes performance sense

In particular, consider timestamps.
There are 365.2422 days per year and 86400 seconds per day
(ignoring leap seconds), so there are 31,556,926.08 seconds per year.
Conveniently, this is about 2**25 seconds.
Now suppose you want to represent timestamps.
I'm not sure how many bits SmallInteger loses out of a 32-bit word;
on a SPARC at least the ideal number would be 2.  Let's say it's 2.
That gives you a range of just 34.02 years.
That's just +/- 17.01 years from your epoch.
The UNIX epoch is 1970.
It's rather later than 1987.
When I called time() just now, I got the answer 16r3DFFB67B.
Assuming 2 bits lost, that's too big to fit as a SmallInteger.
So if you want to use UNIX timestamps as keys, either you
use large integers, or you use Floats.
IEEE Floats can represent timestamps to microsecond precision
for a range of +/- 285.4 years, *exactly*.
(Or to second precision for a range of +/- 285.4 megayears...)
I think a timestamp makes a perfectly reasonable key.

I'm not saying that this is always a good idea, just that it isn't
always a BAD idea, and there is certainly nothing dodgy about floating
point equality as such.

The problem I pointed out is purely and simply this:
    equality between rationals is compatible with rational hashes
AND equality between floats is compatible with float hashes
AND equality between rationals and floats is defined (required by the
    ANSI standard) and behaves just the way a Fortran/Pascal/C hacker
    would expect
BUT rational/float equality is not consistent with rational/float hashes.

A collection with rational keys is not a problem.
A collection with floating point keys is NOT a problem.
A collection with mixed keys *IS* a problem.





More information about the Squeak-dev mailing list