Happy new year everyone. After VWNC recent discussion about comparing Point and Number, I tried to make a review of my previous work concerning all these minor (but annoying) comparison bugs.
I think I now have an acceptable and consistent solution. This is spreaded across several mantis bug reports:
http://bugs.squeak.org/view.php?id=3374 http://bugs.squeak.org/view.php?id=6719 http://bugs.squeak.org/view.php?id=7259 http://bugs.squeak.org/view.php?id=7260
It should also solve:
http://bugs.squeak.org/view.php?id=3360 http://bugs.squeak.org/view.php?id=6601 (I took andrew solution from the last one)
Comments, code critics, flames, etc... are welcome
Nicolas
nicolas cellier wrote:
I think I now have an acceptable and consistent solution. This is spreaded across several mantis bug reports:
Wow. This is quite a bit of work. Here are some comments:
[transitive equality of arithmetic comparison]
I'm a little confused here which version is the "correct" patch. I'm assuming it's the M3374-xxx versions of the files - if not, please let me know.
I generally like the solution proposed for comparisons here but the coercion methods itself look a little odd: First, they appear to rely on a method #isFinite which is not defined anywhere and second -assuming that isFinite returns false for well-defined infinities like IEEE 754- I am not sure why two infinities should not compare equal. I don't have my IEEE 754 spec handy but at least for Float I think that is well defined. Can you explain your reasoning here a little more?
I really like the fix for Integer/Fraction/Float hash.
[NaN comparisons]
This one I don't like quite as much. How about a much simpler version along the lines of:
Magnitude <= aMagnitude "Answer whether the receiver is less than or equal to the argument."
^(self > aMagnitude or:[aMagnitude isNaN]) not
Etc. This should work just as well and requires only changes in Magnitude instead of polluting too many subclasses (may require promoting isNaN to Magnitude).
Also, out of curiosity, what's your take on the IEEE 754 requirement that NaN compares not equal to itself? This breaks fundamental relationships (for example you can't use NaN in Sets or Dicts) and my feeling has always been that it would be more useful if: * there was a bit-fiddling test for Float>>isNaN * NaNs would compare equal to itself (though not necessarily to NaNs with different bit-patterns) * Standard float prims would *never* return NaN but rather fail The last one proved tremendously helpful in Croquet where I've been using fdlibm and simply fail any float prims that would produce NaNs. The image can then take care of it properly.
[point-number comparisons]
This is an elegant solution (nice use of adaptTo:andCompare:) but I'm not sure how much code will break with it. Have you tried running this for a while in daily use?
[float prim behavior with NaN]
I'm with you on this one. The primitives are very, very badly broken at this point and I'm really surprised this has escaped us for so long.
All in all, this is great stuff!
Cheers, - Andreas
Andreas Raab a écrit :
nicolas cellier wrote:
I think I now have an acceptable and consistent solution. This is spreaded across several mantis bug reports:
Wow. This is quite a bit of work. Here are some comments:
[transitive equality of arithmetic comparison]
I'm a little confused here which version is the "correct" patch. I'm assuming it's the M3374-xxx versions of the files - if not, please let me know.
Yes, the correct version is the one corresponding to last Installer hook in bugnotes.
I generally like the solution proposed for comparisons here but the coercion methods itself look a little odd: First, they appear to rely on a method #isFinite which is not defined anywhere and second -assuming that isFinite returns false for well-defined infinities like IEEE 754- I am not sure why two infinities should not compare equal. I don't have my IEEE 754 spec handy but at least for Float I think that is well defined. Can you explain your reasoning here a little more?
isFinite is a pre-requisite that can be found at http://bugs.squeak.org/view.php?id=6983
This is corresponding to Installer hook (Installer mantis ensureFix: 6983)
Definition is very simple and answer false both for nans and infinities isFinite ^self - self = 0.0
Concerning equality of infinities, I think IEEE 754 say they are equal Once, i did wonder why. My rationale is (Float infinity - Float infinity) isNan. Since the difference of two infinities is not defined, how can we decide they represent equal quantities ? I developped this theme at http://bugs.squeak.org/view.php?id=6729
However, I did not change anything, that would mean change IEEE, hardware implementation, etc... A bit too much.
The only thing I change is this: (10 raisedTo: 400) < Float infinity. It used to be equal, because it's asFloat avatar isInfinite...
I really like the fix for Integer/Fraction/Float hash.
Then thank Andrew Tween for the base idea http://bugs.squeak.org/view.php?id=6601
[NaN comparisons]
This one I don't like quite as much. How about a much simpler version along the lines of:
Magnitude <= aMagnitude "Answer whether the receiver is less than or equal to the argument."
^(self > aMagnitude or:[aMagnitude isNaN]) not
Etc. This should work just as well and requires only changes in Magnitude instead of polluting too many subclasses (may require promoting isNaN to Magnitude).
Agree, this is much more simple. I was probably biased by other situations when there is not a total order like Number and Point comparison.
However there are other cases like these ones where it could serve:
(1/2) < #(0 1 2). "#(false true true)" (1/2) > #(0 1 2). "Array does not understand <"
(1/2) < '1'. "true" (1/2) > '1'. "Error: Instances of Fraction are not indexable"
I agree, these are not beautiful examples, some functionalities that might better take there way out of the image.
Notice that: 3 > '2' and: [3.0 > '2']. "true"
Why not define in Fraction what already is in Integer and Float?
Also, out of curiosity, what's your take on the IEEE 754 requirement that NaN compares not equal to itself? This breaks fundamental relationships (for example you can't use NaN in Sets or Dicts) and my
Gasp, I thought I could retrieve an official reference... successor ISO/IEC 10967-2 is not very verbose about that http://www.cs.chalmers.se/~kent/ISOStandards/SC22/WG11/LIA-2/N424.ps
But you can read page 8 of Kahan's http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF (via http://en.wikipedia.org/wiki/IEEE_754-1985)
Anyway, this is implemented by all libraries I know of:
#include <stdio.h> int main() { double nan; nan =0.0 / 0.0; printf("nan == nan is %s\n",(nan==nan)?"true":"false"); return 0; }
Just try the Smalltalk primitive to confirm... Float nan = Float nan
feeling has always been that it would be more useful if:
- there was a bit-fiddling test for Float>>isNaN
- NaNs would compare equal to itself (though not necessarily to NaNs
with different bit-patterns)
- Standard float prims would *never* return NaN but rather fail
The last one proved tremendously helpful in Croquet where I've been using fdlibm and simply fail any float prims that would produce NaNs. The image can then take care of it properly.
Agree, I much prefer an Exception (a pimitiveFail) to a NaN. NaN is not of any use in Smalltalk, except for hooking a poor man external world.
[point-number comparisons]
This is an elegant solution (nice use of adaptTo:andCompare:) but I'm not sure how much code will break with it. Have you tried running this for a while in daily use?
No, I have absolutely no guaranty. My Rationale is as follow: -1) old behaviour is rather broken (1/2) > (1 @3). "MNU" 2 <= (1@3). "true ???" 2 >= (1@3). "true ???" -2) It is easy to revert to a (corrected) coercion behaviour
Point>>adaptToNumber: rcvr andCompare: comparisonSelector ^rcvr@rcvr perform: comparisonSelector with: self
However, such coercion would be a typical case where defining Magnitude >= aMagnitude ^(self > aMagnitude or:[aMagnitude isNaN]) not would not work... (even if Point>>isNan is defined).
[float prim behavior with NaN]
I'm with you on this one. The primitives are very, very badly broken at this point and I'm really surprised this has escaped us for so long.
All in all, this is great stuff!
Thanks
Cheers,
- Andreas
nicolas cellier wrote:
Yes, the correct version is the one corresponding to last Installer hook in bugnotes.
Ah, yes, sorry I missed that. It's too sprinkled around in the notes ;-)
isFinite is a pre-requisite that can be found at http://bugs.squeak.org/view.php?id=6983
This is corresponding to Installer hook (Installer mantis ensureFix: 6983)
Definition is very simple and answer false both for nans and infinities isFinite ^self - self = 0.0
Concerning equality of infinities, I think IEEE 754 say they are equal Once, i did wonder why. My rationale is (Float infinity - Float infinity) isNan. Since the difference of two infinities is not defined, how can we decide they represent equal quantities ?
Interesting point. However, I do think that a = a should be true for all objects including Infinity and NaN.
I really like the fix for Integer/Fraction/Float hash.
Then thank Andrew Tween for the base idea http://bugs.squeak.org/view.php?id=6601
Ah, interesting. I had totally missed that.
[NaN comparisons]
[... snip ...]
However there are other cases like these ones where it could serve:
(1/2) < #(0 1 2). "#(false true true)" (1/2) > #(0 1 2). "Array does not understand <"
(1/2) < '1'. "true" (1/2) > '1'. "Error: Instances of Fraction are not indexable"
I agree, these are not beautiful examples, some functionalities that might better take there way out of the image.
Notice that: 3 > '2' and: [3.0 > '2']. "true"
Why not define in Fraction what already is in Integer and Float?
No reason. I wasn't aware that your changes addressed both situations. I was only concerned that addressing the NaN issues would lead to a proliferation of Magnitude subclasses having to implement all of the comparison operations instead of just a limited subset.
Also, out of curiosity, what's your take on the IEEE 754 requirement that NaN compares not equal to itself? This breaks fundamental relationships (for example you can't use NaN in Sets or Dicts) and my
Gasp, I thought I could retrieve an official reference... successor ISO/IEC 10967-2 is not very verbose about that http://www.cs.chalmers.se/~kent/ISOStandards/SC22/WG11/LIA-2/N424.ps
But you can read page 8 of Kahan's http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF (via http://en.wikipedia.org/wiki/IEEE_754-1985)
Anyway, this is implemented by all libraries I know of:
One idle thought I had today was whether one couldn't entirely get rid of float-valued NaNs and instead introduce a well-known NaN instance that *isn't* an instance of Float and consequently also isn't a Number, doesn't do any arithmetic etc. etc. etc.
It would be fairly simple to change the VM to have a well-known NaN slot in the specialObjectsArray so that when a primitive attempts to return a float-valued NaN via pushFloat: the VM instead pushes that NaN object. This would fix 99% of all places that could ever return a float-valued NaN, including things like ByteArray>>doubleAt:, or the FFI etc.
What I have personally no clue about is whether anyone ever uses float-valued NaNs for anything useful. I have never done that but there may be people who have and it would be interesting to see whether they have any insight into the use of a non-float NaN instance.
[point-number comparisons]
This is an elegant solution (nice use of adaptTo:andCompare:) but I'm not sure how much code will break with it. Have you tried running this for a while in daily use?
No, I have absolutely no guaranty.
Sure. I'm just asking for anecdotal evidence in your day-to-day images. Are you actively using these changes?
Cheers, - Andreas
On Tue, Jan 6, 2009 at 8:49 PM, Andreas Raab andreas.raab@gmx.de wrote:
nicolas cellier wrote:
Yes, the correct version is the one corresponding to last Installer hook in bugnotes.
Ah, yes, sorry I missed that. It's too sprinkled around in the notes ;-)
isFinite is a pre-requisite that can be found at
http://bugs.squeak.org/view.php?id=6983
This is corresponding to Installer hook (Installer mantis ensureFix: 6983)
Definition is very simple and answer false both for nans and infinities isFinite ^self - self = 0.0
Concerning equality of infinities, I think IEEE 754 say they are equal Once, i did wonder why. My rationale is (Float infinity - Float infinity) isNan. Since the difference of two infinities is not defined, how can we decide they represent equal quantities ?
Interesting point. However, I do think that a = a should be true for all objects including Infinity and NaN.
I used to think the same (at least for infinity & infinitessimal) but no longer.. 1 / 0 ~= (Float maxVal * 2). At first the IEEE rules may seem counter-intuitive but they have been well thoguht-out. They provide safety for computations that go out of range. Correctness (in this case lack of false positives) is much more important than "naturalness".
I really like the fix for Integer/Fraction/Float hash.
Then thank Andrew Tween for the base idea http://bugs.squeak.org/view.php?id=6601
Ah, interesting. I had totally missed that.
http://bugs.squeak.org/view.php?id=6719
[NaN comparisons]
[... snip ...]
However there are other cases like these ones where it could serve:
(1/2) < #(0 1 2). "#(false true true)" (1/2) > #(0 1 2). "Array does not understand <"
(1/2) < '1'. "true" (1/2) > '1'. "Error: Instances of Fraction are not indexable"
I agree, these are not beautiful examples, some functionalities that might better take there way out of the image.
Notice that: 3 > '2' and: [3.0 > '2']. "true"
Why not define in Fraction what already is in Integer and Float?
No reason. I wasn't aware that your changes addressed both situations. I was only concerned that addressing the NaN issues would lead to a proliferation of Magnitude subclasses having to implement all of the comparison operations instead of just a limited subset.
Also, out of curiosity, what's your take on the IEEE 754 requirement that
NaN compares not equal to itself? This breaks fundamental relationships (for example you can't use NaN in Sets or Dicts) and my
Gasp, I thought I could retrieve an official reference... successor ISO/IEC 10967-2 is not very verbose about that http://www.cs.chalmers.se/~kent/ISOStandards/SC22/WG11/LIA-2/N424.ps
But you can read page 8 of Kahan's http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF (via http://en.wikipedia.org/wiki/IEEE_754-1985)
Anyway, this is implemented by all libraries I know of:
One idle thought I had today was whether one couldn't entirely get rid of float-valued NaNs and instead introduce a well-known NaN instance that *isn't* an instance of Float and consequently also isn't a Number, doesn't do any arithmetic etc. etc. etc.
It would be fairly simple to change the VM to have a well-known NaN slot in the specialObjectsArray so that when a primitive attempts to return a float-valued NaN via pushFloat: the VM instead pushes that NaN object. This would fix 99% of all places that could ever return a float-valued NaN, including things like ByteArray>>doubleAt:, or the FFI etc.
What I have personally no clue about is whether anyone ever uses float-valued NaNs for anything useful. I have never done that but there may be people who have and it would be interesting to see whether they have any insight into the use of a non-float NaN instance.
The experience with VisualWorks where we changed over to IEEE (with Inf & NaN) was rthat those people who really cared avbout float arithmetic and knew what they were talking about really wanted IEEE behaviour and did not at all want abstract Infinity Infinitessimal or NaN. One thing they wanted was easy interoperability (e.g. with high-performance libraries), which means sticking to IEEE. Introducing abstract Infinity/Infinitessimal/NaN complicates the glue and slows it down.
http://bugs.squeak.org/view.php?id=7259
[point-number comparisons]
This is an elegant solution (nice use of adaptTo:andCompare:) but I'm not sure how much code will break with it. Have you tried running this for a while in daily use?
No, I have absolutely no guaranty.
Sure. I'm just asking for anecdotal evidence in your day-to-day images. Are you actively using these changes?
Cheers,
- Andreas
Eliot Miranda wrote:
I used to think the same (at least for infinity & infinitessimal) but no longer.. 1 / 0 ~= (Float maxVal * 2). At first the IEEE rules may seem counter-intuitive but they have been well thoguht-out. They provide safety for computations that go out of range. Correctness (in this case lack of false positives) is much more important than "naturalness".
I agree. But correctness can also mean that neither 1 / 0 nor Float maxVal * 2 should be allowed to successfully execute primitively.
The experience with VisualWorks where we changed over to IEEE (with Inf & NaN) was rthat those people who really cared avbout float arithmetic and knew what they were talking about really wanted IEEE behaviour and did not at all want abstract Infinity Infinitessimal or NaN. One thing they wanted was easy interoperability (e.g. with high-performance libraries), which means sticking to IEEE. Introducing abstract Infinity/Infinitessimal/NaN complicates the glue and slows it down.
Interesting data point. I'm really torn in this area - it seems like such a broken behavior to have a ~= a.
Cheers, - Andreas
On 6-Jan-09, at 10:33 PM, Andreas Raab wrote:
Eliot Miranda wrote:
I used to think the same (at least for infinity & infinitessimal) but no longer.. 1 / 0 ~= (Float maxVal * 2). At first the IEEE rules may seem counter-intuitive but they have been well thoguht- out. They provide safety for computations that go out of range. Correctness (in this case lack of false positives) is much more important than "naturalness".
I agree. But correctness can also mean that neither 1 / 0 nor Float maxVal * 2 should be allowed to successfully execute primitively.
The experience with VisualWorks where we changed over to IEEE (with Inf & NaN) was rthat those people who really cared avbout float arithmetic and knew what they were talking about really wanted IEEE behaviour and did not at all want abstract Infinity Infinitessimal or NaN. One thing they wanted was easy interoperability (e.g. with high-performance libraries), which means sticking to IEEE. Introducing abstract Infinity/Infinitessimal/NaN complicates the glue and slows it down.
Interesting data point. I'm really torn in this area - it seems like such a broken behavior to have a ~= a.
Cheers,
- Andreas
Also read/see http://en.wikipedia.org/wiki/NaN
However the other place where I've been burned with this is doing what you think is IEEE math in smalltalk then converting some of the code to SQL stored procedures and the database engine DOES do proper IEEE logic.
You then discover NaN data being poured into the database via some other forms of data collection results in the stored procedure code giving you different results from the original smalltalk code, but only in production late one dark winter night...
-- = = = ======================================================================== John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ========================================================================
One idle thought I had today was whether one couldn't entirely get rid of float-valued NaNs and instead introduce a well-known NaN instance that *isn't* an instance of Float and consequently also isn't a Number, doesn't do any arithmetic etc. etc. etc.
Another possibility is to have the primitives fail and the fallback code signal a Notification. The notification would by default return a NaN float. It might be okay to use a NaN object too;; however it should behave the same way as IEEE NaNs and would have to be interoperable with FFI and FloatArrays (does Squeak have them?).
I don't think it makes sense to use Integer or Fraction NaNs. In that case you do not have infinities either, and raising hard errors makes more sense. Note that GNU Smalltalk does not even raise a ZeroDivide for floating-point division:
st> 1 / 0 Object: 1 error: The program attempted to divide a number by zero ZeroDivide(Exception)>>signal (AnsiExcept.st:216) SmallInteger(Number)>>zeroDivide (AnsiExcept.st:1534) SmallInteger>>/ (SmallInt.st:277) UndefinedObject>>executeStatements (a String:1) nil
st> 1.0 / 0.0 Inf
What I have personally no clue about is whether anyone ever uses float-valued NaNs for anything useful.
It allows delaying error checking to after the end of the computation. Some errors that do not affect the result would be silently ignored; most errors would result in a NaN. Of course Smalltalk floats are so slow that this might not even make a difference (unless you use FFI to interface with external libraries).
With floating-point, I decided that the best course of action is "assume the IEEE-754 people are always right" (and that means W. Kahan mostly).
Paolo
Paolo Bonzini a écrit :
What I have personally no clue about is whether anyone ever uses float-valued NaNs for anything useful.
It allows delaying error checking to after the end of the computation. Some errors that do not affect the result would be silently ignored; most errors would result in a NaN. Of course Smalltalk floats are so slow that this might not even make a difference (unless you use FFI to interface with external libraries).
With floating-point, I decided that the best course of action is "assume the IEEE-754 people are always right" (and that means W. Kahan mostly).
Paolo
This is a wise decision for most current situations.
What is unwise is to not fully support IEEE 754 Exception handling.
Default IEE754 rules are not mathematically well founded They are just pragmatic hacks for solving nicely and efficiently some common problems.
If we have have different requirements we should be able to trap FPU exception and provide our own exception handling. And this should occur from within the image if we pretend Smalltalk is a general purpose language.
What I would like is better IEEE 754 integration and portability. 1) portable way to declare and handle extended double (long double in C) 2) portable way to get/set FPU sticky error flags 3) portable way to activate and catch FPU exceptions 4) portable way to select rounding mode
Alas, that's one point where IEEE 754 success was incomplete... When reading ieee man pages, it seems there are as many interfaces as OS versions... I don't even know which hardware supports or not these traps... I even wonder if such features were mandatory... That should not stop us: as suggested, Smalltalk is already sufficiently slow to afford a sticky flag test at each Float primitive in case no hardware trap is available.
With a portable C library, we could afford a few more primitives and decide at image level how to handle these Exceptions (either a primitiveFail or a quiet NaN in case of INVALID operation for example).
I understand this is not a Smalltalk priority; ST is not known as a flying Float cruncher... However, ST markets should better not be restricted by the kernel...
Nicolas
PS: sure, I know gst is among best behaved (with STX) having extended double...
nicolas cellier writes:
I don't even know which hardware supports or not these traps... I even wonder if such features were mandatory... That should not stop us: as suggested, Smalltalk is already sufficiently slow to afford a sticky flag test at each Float primitive in case no hardware trap is available.
Hopefully Smalltalk will not always be that slow.
All we'd need to do was remove the boxing and unboxing around floating point operations. Inside single statements, that is equivalent to the Boolean optimisations that Exupery already does. Across statements will require a proper optimiser.
Bryce
bryce@kampjes.demon.co.uk wrote:
nicolas cellier writes:
I don't even know which hardware supports or not these traps... I even wonder if such features were mandatory... That should not stop us: as suggested, Smalltalk is already sufficiently slow to afford a sticky flag test at each Float primitive in case no hardware trap is available.
Hopefully Smalltalk will not always be that slow.
All we'd need to do was remove the boxing and unboxing around floating point operations. Inside single statements, that is equivalent to the Boolean optimisations that Exupery already does. Across statements will require a proper optimiser.
Good support for matrix/vector operations, possibly with an interface to BLAS, seems like a much better plan.
Paolo
<bryce <at> kampjes.demon.co.uk> writes:
nicolas cellier writes:
I don't even know which hardware supports or not these traps... I even wonder if such features were mandatory... That should not stop us: as suggested, Smalltalk is already sufficiently slow to afford a sticky flag test at each Float primitive in case no hardware trap is available.
Hopefully Smalltalk will not always be that slow.
All we'd need to do was remove the boxing and unboxing around floating point operations. Inside single statements, that is equivalent to the Boolean optimisations that Exupery already does. Across statements will require a proper optimiser.
Bryce
Great, but if FPU registers are using extra precision (extended double), then the result of operations might depend on register/memory exchange.
Let's compute (a*b*c) where a,b,c are double precision (Float in squeak). Let selectors: - d2e mean (convert double to extended) - e2d mean (convert extended to double)
For a double d, (d d2e e2d = d) is always true For an extended e, (e e2d d2e = e) might be false because some bits are lost.
Unoptimized Smalltalk does: ((a d2e * b d2e) e2d d2e * c d2e) e2d. which should - thanks to IEEE 754 - be strictly equivalent to: (a * b) * c.
But optimized Smalltalk might evaluate: ((a d2e * b d2e) * c d2e) e2d. Which is no more equivalent in case (a*b) is inexact... Of course, result will be more accurate. But impossible to have bit-identical results depending on optimization level...
Do you plan forcing e2d conversions in generated code? If not, you will trade bit-identical property for speed (as most i86 C compilers already do). Smalltalk vm will fail being hardware independent... (I suspect this is already the case for libm sin, log etc...)
IMO user should keep control over precision, and not let uncontrolled tools take fancy decisions. The right way if we eventually want to profit from this extra precision is to introduce an ExtendedDouble class (like gst did).
Nicolas
On Jan 8, 2009, at 2:19 AM, Nicolas Cellier wrote:
Let's compute (a*b*c) where a,b,c are double precision (Float in squeak). Let selectors:
- d2e mean (convert double to extended)
- e2d mean (convert extended to double)
For a double d, (d d2e e2d = d) is always true For an extended e, (e e2d d2e = e) might be false because some bits are lost.
Unoptimized Smalltalk does: ((a d2e * b d2e) e2d d2e * c d2e) e2d. which should - thanks to IEEE 754 - be strictly equivalent to: (a * b) * c.
Unfortunately, this is not necessarily so, because of the extra rounding steps involved:
(a * b) performs a double-precision rounding step after the multiply, while ((a d2e * b d2e) e2d) performs an extended-precision rounding step after the multiply, then another double-precision rounding step in the conversion back to double-precision. This will result in a difference in the final values in many cases.
There is a mode bit in x86 to cause the intermediate rounding to be done with double-precision instead of extended-precision, but that does not limit the exponent range to the double-precision range, so denormal results may still be different.
-- Tim Olson
Tim Olson schrieb:
(a * b) performs a double-precision rounding step after the multiply, while ((a d2e * b d2e) e2d) performs an extended-precision rounding step after the multiply, then another double-precision rounding step in the conversion back to double-precision. This will result in a difference in the final values in many cases.
There is a mode bit in x86 to cause the intermediate rounding to be done with double-precision instead of extended-precision, but that does not limit the exponent range to the double-precision range, so denormal results may still be different.
AFAIK, the Croquet project needs to have exactly reproducible float arithmetics on all platforms. Would these rounding effects on the x86 affect that goal (perhaps with denormal intermediate results)? Maybe it would be nice to write a unit test which would uncover different rounding behaviors between purely-double-precision FP hardware and extended-precision hardware.
Cheers, Hans-Martin
AFAIK, the Croquet project needs to have exactly reproducible float arithmetics on all platforms. Would these rounding effects on the x86 affect that goal (perhaps with denormal intermediate results)? Maybe it would be nice to write a unit test which would uncover different rounding behaviors between purely-double-precision FP hardware and extended-precision hardware.
Paolo
On Jan 8, 2009, at 9:30 AM, Hans-Martin Mosner wrote:
AFAIK, the Croquet project needs to have exactly reproducible float arithmetics on all platforms. Would these rounding effects on the x86 affect that goal (perhaps with denormal intermediate results)?
Yes.
Maybe it would be nice to write a unit test which would uncover different rounding behaviors between purely-double-precision FP hardware and extended-precision hardware.
I'd have to play around a bit to get this into a Squeak unit test, but here's a test vector which shows the effect:
3ff3208a25e04e87 * 000316dd1d02d1ae
x86 result: 0003b16ef930a76e powerPC result: 0003b16ef930a76f
These are IEEE double-precision floating-point numbers, specified in hex (big-endian) so that they are bit exact (no conversion error from a decimal representation). The first number is between 1.0 and 2.0, while the second number is a denormal value. The product shows an lsb difference between an x86 platform (extended-precision FPU) and a PPC platform (double-precision FPU), even when the x86 floating-point control word is set to use double-precision rounding for all results.
-- Tim Olson
Tim Olson schrieb:
I'd have to play around a bit to get this into a Squeak unit test, but here's a test vector which shows the effect:
Thanks for that; I've made it into a unit test (see attachment) but at the moment I'm unsure about the course of action that we should take.
First of all, does it matter? If I understand correctly, this behavior is only present for denormalized numbers. Do these appear in real-world cases? My guess is that an appropriate computation using numbers with minimum exponent is the only way of getting denormalized numbers in real-world programs. I'd guess that for example, with repeated applications of 3D transformation matrixes which would combine to an identity transormation if their numbers had infinite precision, you could get into such a situation. Imagine repeated rotations by 10 degrees - after 36 steps you would theoretically have identity, but your actual transformation will be slightly off. This would be a typical Croquet case, and therefore I think that the probability of miscalculation is real.
I've tried to analyze the case in question and came to the following results: The exact mantissa after multiplication is 3B16EF930A76E.80002C69F96C2 (the hex digits after the point are those that should be rounded off when going to a 52-bit mantissa). The result with "A76F" as the last hex digits would therefore be the correct value for an IEEE-754 double precision multiplication (rounding to the nearest representable number), so the PPC implementation does it right. When doing an extended double precision multiplication, there are some more bits in the mantissa, and the mantissa of the intermediate result looks like ...A76E.800 which is exacly in the middle between two representable numbers. Converting to double precision involves rounding the mantissa again, and the rounding rule for this case (exact middle) says to round to the nearest even number, which is ...A76E.
Is it at all possible to get the x86 FPU to produce the correct result for this situation? Interestingly, this article (http://www.vinc17.org/research/extended.en.html) claims that the FPU is set up for incorrect rounding under Linux only, but I could not reproduce the test case given there with Squeak (which probably means that I mistranslated the Java example). I have not yet tried the unit test under Windows to check whether the rounding mode is really set differently there. If the test runs successfully under Windows, there might be a chance of setting the rounding mode of the FPU for correct double precision rounding under Linux, too.
Cheers, Hans-Martin
'From Squeak3.10.2 of ''5 June 2008'' [latest update: #7179] on 9 January 2009 at 8:28:06 am'!
!FloatTest methodsFor: 'IEEE 754' stamp: 'hmm 1/9/2009 08:18'! testDenormalResultWithExtendedPrecision "The x86 FP hardware uses 80-bit internal representation. Example supplied by Tim Olson on the squeak-dev mailing list: 3ff3208a25e04e87 * 000316dd1d02d1ae
x86 result: 0003b16ef930a76e powerPC result: 0003b16ef930a76f
These are IEEE double-precision floating-point numbers, specified in hex (big-endian) so that they are bit exact (no conversion error from a decimal representation). The first number is between 1.0 and 2.0, while the second number is a denormal value. The product shows an lsb difference between an x86 platform (extended-precision FPU) and a PPC platform (double-precision FPU), even when the x86 floating-point control word is set to use double-precision rounding for all results."
| f1 f2 product | f1 := 0.0+0.0. f2 := 0.0+0.0. f1 basicAt: 1 put: 16r3FF3208A; basicAt: 2 put: 16r25E04E87. f2 basicAt: 1 put: 16r000316DD; basicAt: 2 put: 16r1D02D1AE. product := f1 * f2. self assert: (product basicAt: 1) = 16r0003B16E. self assert: (product basicAt: 2) = 16rF930A76F! !
On Jan 9, 2009, at 3:08 AM, Paolo Bonzini wrote:
Is it at all possible to get the x86 FPU to produce the correct result for this situation?
You can also try compiling the Squeak VM with -mfpmath=sse.
That should work, as long as you are running it on x86 processors that support SSE2 with double-precision scalar operations. The SSE floating-point unit(s) perform true single- and double-precision arithmetic.
-- tim
El 1/9/09 10:43 AM, "Tim Olson" tim_olson@att.net escribió:
On Jan 9, 2009, at 3:08 AM, Paolo Bonzini wrote:
Is it at all possible to get the x86 FPU to produce the correct result for this situation?
You can also try compiling the Squeak VM with -mfpmath=sse.
That should work, as long as you are running it on x86 processors that support SSE2 with double-precision scalar operations. The SSE floating-point unit(s) perform true single- and double-precision arithmetic.
-- tim
On Jan 9, 2009, at 2:29 AM, Hans-Martin Mosner wrote:
First of all, does it matter? If I understand correctly, this behavior is only present for denormalized numbers.
If you set the internal rounding-precision mode in the x86 control register to double-precision, then yes, the multiple rounding issue goes away for most computations. It only remains when generating denormal results because the exponent field size still remains at extended precision during the computation, so the denormalized result is only generated during the conversion back to double-precision format, leading to multiple rounding operations.
Do these appear in real-world cases?
That's hard to say. It might be interesting to instrument the VM to check for denormal operands or results on the float operations to get a feel for how often (if ever) they are occurring.
I've tried to analyze the case in question and came to the following results: The exact mantissa after multiplication is 3B16EF930A76E.80002C69F96C2 (the hex digits after the point are those that should be rounded off when going to a 52-bit mantissa). The result with "A76F" as the last hex digits would therefore be the correct value for an IEEE-754 double precision multiplication (rounding to the nearest representable number), so the PPC implementation does it right. When doing an extended double precision multiplication, there are some more bits in the mantissa, and the mantissa of the intermediate result looks like ...A76E.800 which is exacly in the middle between two representable numbers. Converting to double precision involves rounding the mantissa again, and the rounding rule for this case (exact middle) says to round to the nearest even number, which is ...A76E.
That's sort of what is going on, but it is complicated here due to the way denorms are handled. What is actually happening is:
1st operand (binary): 1.0011001000001000101000100101111000000100111010000111 (note "hidden" 1 bit added to the left of the radix point because it is a normalized value)
2nd operand (binary): 0.0011000101101101110100011101000000101101000110101110 (note no "hidden" 1 bit because it is a denormalized value)
Before multiplication, the 2nd operand is renormalized, adjusting the exponent accordingly: 1.1000101101101110100011101000000101101000110101110000
The exact product is:
1.1101100010110111011111001001100001010011101101110100,0000000000000001. ..
The comma (,) is at the double-precision rounding position.
PPC operation: The product exponent is smaller than can be represented in a normalized double-precision format, so the result is first denormalized:
0.001101100010110111011111001001100001010011101101110,100000000000000000 1...
Then round-to-nearest rounding adds an ULP because the bits to the right of the rounding position are greater than halfway:
0.001101100010110111011111001001100001010011101101111
x86 (extended with double-rounding mode):
Because the exponent field is still sized for 80-bit extended floats, the exact product is still representable as a normalized number:
1.1101100010110111011111001001100001010011101101110100,0000000000000001. ..
Then round-to-nearest drops the bits to the right of the double-precision rounding point without adding an ULP, because the bits to the right are less than halfway:
1.1101100010110111011111001001100001010011101101110100
Then the result is converted to a double-precision representation when storing to memory, which causes it to become denormalized:
0.0011101100010110111011111001001100001010011101101110,100
Round-to-nearest rounding does not add an ULP because the bits to the right of the rounding position are exactly halfway, and in that case the nearest even result is selected:
0.0011101100010110111011111001001100001010011101101110
Is it at all possible to get the x86 FPU to produce the correct result for this situation?
Not using the extended-precision FPU, without an extreme performance loss. You basically would have to cause an exception for all imprecise results (lots of them!) and check them in software. If you perform the operations in the SSE unit, then that will work, provided all the x86 platforms you run on support SSE2 with double-precision scalars.
Interestingly, this article (http://www.vinc17.org/research/extended.en.html) claims that the FPU is set up for incorrect rounding under Linux only, but I could not reproduce the test case given there with Squeak (which probably means that I mistranslated the Java example).
That example shows the effect for normalized intermediate results when the intermediate result is rounded to extended precision, then double-precision rounding occurs when converting to double-precision format. That can be fixed by setting the rounding-precision mode to double in the fpu control register, but, as shown in the example above, it does not work in the case of denormalized intermediate results.
I suspect that your Squeak VM has the rounding-precision mode set to double, which fixes most of the cases.
-- tim
On 09.01.2009, at 05:31, Tim Olson wrote:
On Jan 8, 2009, at 9:30 AM, Hans-Martin Mosner wrote:
AFAIK, the Croquet project needs to have exactly reproducible float arithmetics on all platforms. Would these rounding effects on the x86 affect that goal (perhaps with denormal intermediate results)?
Yes.
Maybe it would be nice to write a unit test which would uncover different rounding behaviors between purely-double-precision FP hardware and extended-precision hardware.
I'd have to play around a bit to get this into a Squeak unit test, but here's a test vector which shows the effect:
3ff3208a25e04e87 * 000316dd1d02d1ae
x86 result: 0003b16ef930a76e powerPC result: 0003b16ef930a76f
These are IEEE double-precision floating-point numbers, specified in hex (big-endian) so that they are bit exact (no conversion error from a decimal representation). The first number is between 1.0 and 2.0, while the second number is a denormal value. The product shows an lsb difference between an x86 platform (extended-precision FPU) and a PPC platform (double-precision FPU), even when the x86 floating-point control word is set to use double-precision rounding for all results.
On my Intel Mac with VM 3.8.17b6 I get ...a76f too, so this is not an x86 issue.
a := Float newFrom: #(16r3FF3208A 16r25E04E87). b := Float newFrom: #(16r000316DD 16r1D02D1AE). {a hex. b hex. (a*b) hex}
#('3FF3208A25E04E87' '000316DD1D02D1AE' '0003B16EF930A76F')
I do get ...a76e under Linux (x86 VMWare emulation).
- Bert -
On Jan 8, 2009, at 9:30 AM, Hans-Martin Mosner wrote:
AFAIK, the Croquet project needs to have exactly reproducible float arithmetics on all platforms.
Are you using the exact same code on all platforms for trancendentals and other floating-point functions? If not, that is going to be the greatest source of different results. They tend to vary widely on accuracy, since there is no 1/2 ULP accuracy requirement for those functions, unlike for the defined IEEE operations (+,-,*,/,sqrt)
-- Tim Olson
On 09.01.2009, at 16:16, Tim Olson wrote:
On Jan 8, 2009, at 9:30 AM, Hans-Martin Mosner wrote:
AFAIK, the Croquet project needs to have exactly reproducible float arithmetics on all platforms.
Are you using the exact same code on all platforms for trancendentals and other floating-point functions? If not, that is going to be the greatest source of different results. They tend to vary widely on accuracy, since there is no 1/2 ULP accuracy requirement for those functions, unlike for the defined IEEE operations (+,-,*,/,sqrt)
Yes, Andreas wrote a new plugin to ensure bit-identical floating point ops across platforms.
- Bert -
years back we coded up a FloatMathPlugin to do IEEE math a front end to http://www.netlib.org/fdlibm/ in april of 06
Testing showed we could do IEEE math using the plugin on linux/windows/ mac (powerpc).
I believe Qwaq took over building this for macintel.
Also note the unix and carbon VM do fiddle with the FP unit control register via this code I've a copy of you would need to check the build tree for the unix version to see if it is different.
#if defined(__GNUC__) && ( defined(i386) || defined(__i386) || defined(__i386__) \ || defined(i486) || defined(__i486) || defined (__i486__) \ || defined(intel) || defined(x86) || defined(i86pc) ) static void fldcw(unsigned int cw) { __asm__("fldcw %0" :: "m"(cw)); } #else # define fldcw(cw) #endif
#if defined(__GNUC__) && ( defined(ppc) || defined(__ppc) || defined(__ppc__) \ || defined(POWERPC) || defined(__POWERPC) || defined (__POWERPC__) ) void mtfsfi(unsigned long long fpscr); void mtfsfi(unsigned long long fpscr) { __asm__("lfd f0, %0" :: "m"(fpscr)); __asm__("mtfsf 0xff, f0"); } #else # define mtfsfi(fpscr) #endif
On 9-Jan-09, at 7:20 AM, Bert Freudenberg wrote:
On 09.01.2009, at 16:16, Tim Olson wrote:
On Jan 8, 2009, at 9:30 AM, Hans-Martin Mosner wrote:
AFAIK, the Croquet project needs to have exactly reproducible float arithmetics on all platforms.
Are you using the exact same code on all platforms for trancendentals and other floating-point functions? If not, that is going to be the greatest source of different results. They tend to vary widely on accuracy, since there is no 1/2 ULP accuracy requirement for those functions, unlike for the defined IEEE operations (+,-,*,/,sqrt)
Yes, Andreas wrote a new plugin to ensure bit-identical floating point ops across platforms.
- Bert -
-- = = = ======================================================================== John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ========================================================================
Tim Olson wrote:
On Jan 8, 2009, at 9:30 AM, Hans-Martin Mosner wrote:
AFAIK, the Croquet project needs to have exactly reproducible float arithmetics on all platforms.
Slightly off-topic, but since we have such a knowledgeable crowd...
As noted by John, Croquet uses fdlibm for bit-identical floating point math. Does anyone have a feeling for how difficult (or impossible) it will be to achieve identical computation on OpenCL-compliant devices? For example, how difficult would it be to port, say, fdlibm, so that trancedentals use the exact same code? Any other show-stoppers that might not occur to the naive mind :-) ?
Thanks, Josh
Are you using the exact same code on all platforms for trancendentals and other floating-point functions? If not, that is going to be the greatest source of different results. They tend to vary widely on accuracy, since there is no 1/2 ULP accuracy requirement for those functions, unlike for the defined IEEE operations (+,-,*,/,sqrt)
-- Tim Olson
On 10.01.2009, at 11:01, Josh Gargus wrote:
As noted by John, Croquet uses fdlibm for bit-identical floating point math. Does anyone have a feeling for how difficult (or impossible) it will be to achieve identical computation on OpenCL- compliant devices?
The numerical behavior of compliant OpenCL implementations is covered in section 7 of the OpenCL spec. In particular, table 7.1 gives the error bounds for the various operations. If I interpret that correctly, very few functions are guaranteed to behave bit-identical.
For example, how difficult would it be to port, say, fdlibm, so that trancedentals use the exact same code? Any other show-stoppers that might not occur to the naive mind :-) ?
Well OpenCL only requires single-precision, double-precision support is optional, whereas fdlibm is double-precision only. I don't know how compliant current implementations actually are.
- Bert -
Bert Freudenberg wrote:
On 10.01.2009, at 11:01, Josh Gargus wrote:
As noted by John, Croquet uses fdlibm for bit-identical floating point math. Does anyone have a feeling for how difficult (or impossible) it will be to achieve identical computation on OpenCL-compliant devices?
The numerical behavior of compliant OpenCL implementations is covered in section 7 of the OpenCL spec. In particular, table 7.1 gives the error bounds for the various operations. If I interpret that correctly, very few functions are guaranteed to behave bit-identical.
Oops, I was skimming by the time I read that part of the spec. I saw that the transcendental functions need not return identical results (which was why I mentioned porting fdlibm), but I missed that even x/y isn't precisely specified.
For example, how difficult would it be to port, say, fdlibm, so that trancedentals use the exact same code? Any other show-stoppers that might not occur to the naive mind :-) ?
Well OpenCL only requires single-precision, double-precision support is optional, whereas fdlibm is double-precision only.
I didn't know that. Maybe that's what the "d" stands for in "fdlibm".
I don't know how compliant current implementations actually are.
I think that there's only one implementation right now, and only available to paid-up Apple developers. Anyway, I'm more interested in what the spec says than current conformance... implementations will gradually become more compliant.
Thanks, Josh
- Bert -
Reading the CUDA 2.0 programming guide, I saw an interesting difference between the error-bounds for single- and double-precision floating point:
Double-precision Operation ULPs x+y 0 (IEEE-754 round-to-nearest-even) x*y 0 (IEEE-754 round-to-nearest-even) x/y 0 (IEEE-754 round-to-nearest-even) 1/x 0 (IEEE-754 round-to-nearest-even) sqrt(x) 0 (IEEE-754 round-to-nearest-even)
Single-precision Operation ULPs x+y 0 (IEEE-754 round-to-nearest-even) x*y 0 (IEEE-754 round-to-nearest-even) x/y 2 1/x 1 sqrt(x) 3
So, there might be some hope, at least for double-precision ops. Currently, AFAIK, the latest NVIDIA GPUSs are the only ones with double-precision FP support. But, things are changing rapidly in this area: in about a year, Intel will release Larrabee, which will "fully support IEEE standards for single and double precision floating-point arithmetic". Hopefully this forces the other vendors to follow suit, and future OpenCL revisions reflect this.
Cheers, Josh
Josh Gargus wrote:
Bert Freudenberg wrote:
On 10.01.2009, at 11:01, Josh Gargus wrote:
As noted by John, Croquet uses fdlibm for bit-identical floating point math. Does anyone have a feeling for how difficult (or impossible) it will be to achieve identical computation on OpenCL-compliant devices?
The numerical behavior of compliant OpenCL implementations is covered in section 7 of the OpenCL spec. In particular, table 7.1 gives the error bounds for the various operations. If I interpret that correctly, very few functions are guaranteed to behave bit-identical.
Oops, I was skimming by the time I read that part of the spec. I saw that the transcendental functions need not return identical results (which was why I mentioned porting fdlibm), but I missed that even x/y isn't precisely specified.
For example, how difficult would it be to port, say, fdlibm, so that trancedentals use the exact same code? Any other show-stoppers that might not occur to the naive mind :-) ?
Well OpenCL only requires single-precision, double-precision support is optional, whereas fdlibm is double-precision only.
I didn't know that. Maybe that's what the "d" stands for in "fdlibm".
I don't know how compliant current implementations actually are.
I think that there's only one implementation right now, and only available to paid-up Apple developers. Anyway, I'm more interested in what the spec says than current conformance... implementations will gradually become more compliant.
Thanks, Josh
- Bert -
Nicolas Cellier writes:
<bryce <at> kampjes.demon.co.uk> writes:
nicolas cellier writes:
I don't even know which hardware supports or not these traps... I even wonder if such features were mandatory... That should not stop us: as suggested, Smalltalk is already sufficiently slow to afford a sticky flag test at each Float primitive in case no hardware trap is available.
Hopefully Smalltalk will not always be that slow.
All we'd need to do was remove the boxing and unboxing around floating point operations. Inside single statements, that is equivalent to the Boolean optimisations that Exupery already does. Across statements will require a proper optimiser.
Bryce
Great, but if FPU registers are using extra precision (extended double), then the result of operations might depend on register/memory exchange.
The x86 SSE2 instruction set uses 64 bit floats. It also supports 16 floating point registers on 64 bit platforms. My plan is to use that.
IMO the user needs control over the precision/portability/performance tradeoffs but I haven't thought of any good ways of providing such control.
Bryce
Andreas Raab a écrit :
nicolas cellier wrote:
[NaN comparisons]
[... snip ...]
However there are other cases like these ones where it could serve:
(1/2) < #(0 1 2). "#(false true true)" (1/2) > #(0 1 2). "Array does not understand <"
(1/2) < '1'. "true" (1/2) > '1'. "Error: Instances of Fraction are not indexable"
I agree, these are not beautiful examples, some functionalities that might better take there way out of the image.
Notice that: 3 > '2' and: [3.0 > '2']. "true"
Why not define in Fraction what already is in Integer and Float?
No reason. I wasn't aware that your changes addressed both situations. I was only concerned that addressing the NaN issues would lead to a proliferation of Magnitude subclasses having to implement all of the comparison operations instead of just a limited subset.
I redefine only one message in Magnitude
= aMagnitude
^aMagnitude <= self
instead of:
= aMagnitude
^(self < aMagnitude) not
This is absolutely not necessary and can be skipped.
What I had in mind was that it would be sufficient to define only < in subclass in case of total order, and only < and <= in case of partial order. That would work with Nan, but that's not true with Array for example... Argument class also has to agree on the reverse relation.
One idle thought I had today was whether one couldn't entirely get rid of float-valued NaNs and instead introduce a well-known NaN instance that *isn't* an instance of Float and consequently also isn't a Number, doesn't do any arithmetic etc. etc. etc.
It would be fairly simple to change the VM to have a well-known NaN slot in the specialObjectsArray so that when a primitive attempts to return a float-valued NaN via pushFloat: the VM instead pushes that NaN object. This would fix 99% of all places that could ever return a float-valued NaN, including things like ByteArray>>doubleAt:, or the FFI etc.
What I have personally no clue about is whether anyone ever uses float-valued NaNs for anything useful. I have never done that but there may be people who have and it would be interesting to see whether they have any insight into the use of a non-float NaN instance.
Mostly agree but: 1) I would prefer a behaviour controlled at image level Read more in my response to Paolo. 2) We cannot just ignore outside world. I think Eliot, Paolo, John did answer enough about it.
[point-number comparisons]
This is an elegant solution (nice use of adaptTo:andCompare:) but I'm not sure how much code will break with it. Have you tried running this for a while in daily use?
No, I have absolutely no guaranty.
Sure. I'm just asking for anecdotal evidence in your day-to-day images. Are you actively using these changes?
No, it is in my personal images, but 1) I don't use squeak intensively (not professionally). 2) The changes are far too young. Some volunteers have to help...
Cheers,
- Andreas
squeak-dev@lists.squeakfoundation.org