Float equality? (was: [BUG] Float NaN's)

Wed Sep 15 03:22:25 UTC 2004

"Jarvis, Robert P. (Bob) (Contingent)" <bob.jarvis at timken.com> wrote:
	The best practice I'm aware of for handling equality
	calculations with Floats is avoid them completely.

I do wish that more people would read "What Every Computer Scientist
Should Know About Floating-Point Arithmetic".  There may be a mistake
or too in that title, but you'll certainly find the paper around on
the Web. and Sun used to make a habit of shipping it with their
compilers.  This is not directed at Robert Jarvis, but at his audience.

Floating-point equality tests are in fact perfectly well behaved
(in the absence of NaN, sigh).  More than that, when the operands
and result are integers in the range -(2**53 - 1) .. +(2**53-1)
held as IEEE 754 double precision numbers, addition, subtraction,
multiplication, remainder() -- hence also division via
rint((x - remainder(x,y))/y) -- and comparison are EXACT.

If you want to test whether a number x is a positive infinity,
then x == Inf is the best way to do it.

There are plenty of examples where floating-point equality is exactly
the right thing to do.  Any blanket ban on floating point equality is
too strict.

The problem is not equality.  The problem is that floating-point
arithmetic is BINARY, not decimal, and it's APPROXIMATE, not exact.
It just plain doesn't do what people expect.  With the possible
exception of absolute value and unary minus, there is NO floating-point
operation which does what a naive user would expect.  And while IEEE
floating-point is bizarre, it isn't outright broken like many of the
hardware floating-point systems that preceded it.

What I'd really like to get my hands on is the decimal floating-point
arithmetic in the revised IEEE standard.  *That's* the arithmetic you
want for a spreadsheet.  That's the arithmetic you want if people are
not to be tripped up by base 2 -vs- base 10.  In fact you *can* get
your hands on a software implementation if you know where to look, but
wouldn't it be nice to have it going at full hardware speed?

	You should
	establish what you consider to be an acceptable epsilon value
	based on your understanding of your data and use it as follows:

It's not just your data you have to understand; more generally it is
your algorithm.  Anyone who understands them well enough to choose a
good epsilon already knows how to do the fuzzy comparisons.

		maxEpsilon = 0.000001.
			.
			.
			.
		(f1 - f2) abs < maxEpsilon
			ifTrue: ["f1 and f2 are approximately equal"]
			ifFalse: ["f1 and f2 are not approximately equal"]

Urk.  Absolute tolerances seldom work very well.
See Knuth, The Art of Computer PRogramming, Volume 2 "Seminumerical
Algorithms" for a thorough discussion of "fuzzy" floating-point comparison.

The really nasty thing about fuzzy comparisons is that they aren't
transitive:  (x fuzzyEquals: y) and: [y fuzzyEquals: z] does NOT
imply x fuzzyEquals: z.  And yes, I *have* known programs (in APL and
in IBM Prolog) go wrong because their programmers didn't really understand
that they were getting fuzzy comparison and/or didn't appreciate the
consequences.  (What Robert Jarvis is recommending is *explicitly* doing
fuzzy comparison with a *specifically* chosen *local* tolerance, not
implicit fuzzy comparison with a *global* tolerance.  So it should be
less risky.)

	Do not under any circumstances use floating point numbers in
	financial calculations.  Floats are imprecise, often only
	approximate, and utterly inappropriate for any calculation where
	all the fiddly little decimal places really count.

Except decimal floats.  Addition, subtraction, and multiplication of
in-range numbers stated in decimal with in-range results are *exact*.
We *really* want the new IEEE standard, don't we?

I wish I understood the ANSI Smalltalk 'ScaledDecimal' interface a bit
better.  I'm not sure I believe all of what I think I do understand.

And of course *some* financial calculations are *supposed* to be
approximate.  The thing is, as always 'know what you are doing'.

	>This does not require the use of a Float.  In Smalltalk I'd use either a
	>Fraction or a ScaledDecimal.

Unfortunately, there is no ScaledDecimal class in Squeak.
I do have an implementation of ScaledDecimal I wrote for another
Smalltalk, but it would require some work to fit with the traditional
double dispatch, and I am wary about changing the compiler to recognise
ScaledDecimals.  Above all, I'm not sure I've interpreted the standard
correctly.