Reading the CUDA 2.0 programming guide, I saw an interesting difference between the error-bounds for single- and double-precision floating point:
Double-precision Operation ULPs x+y 0 (IEEE-754 round-to-nearest-even) x*y 0 (IEEE-754 round-to-nearest-even) x/y 0 (IEEE-754 round-to-nearest-even) 1/x 0 (IEEE-754 round-to-nearest-even) sqrt(x) 0 (IEEE-754 round-to-nearest-even)
Single-precision Operation ULPs x+y 0 (IEEE-754 round-to-nearest-even) x*y 0 (IEEE-754 round-to-nearest-even) x/y 2 1/x 1 sqrt(x) 3
So, there might be some hope, at least for double-precision ops. Currently, AFAIK, the latest NVIDIA GPUSs are the only ones with double-precision FP support. But, things are changing rapidly in this area: in about a year, Intel will release Larrabee, which will "fully support IEEE standards for single and double precision floating-point arithmetic". Hopefully this forces the other vendors to follow suit, and future OpenCL revisions reflect this.
Cheers, Josh
Josh Gargus wrote:
Bert Freudenberg wrote:
On 10.01.2009, at 11:01, Josh Gargus wrote:
As noted by John, Croquet uses fdlibm for bit-identical floating point math. Does anyone have a feeling for how difficult (or impossible) it will be to achieve identical computation on OpenCL-compliant devices?
The numerical behavior of compliant OpenCL implementations is covered in section 7 of the OpenCL spec. In particular, table 7.1 gives the error bounds for the various operations. If I interpret that correctly, very few functions are guaranteed to behave bit-identical.
Oops, I was skimming by the time I read that part of the spec. I saw that the transcendental functions need not return identical results (which was why I mentioned porting fdlibm), but I missed that even x/y isn't precisely specified.
For example, how difficult would it be to port, say, fdlibm, so that trancedentals use the exact same code? Any other show-stoppers that might not occur to the naive mind :-) ?
Well OpenCL only requires single-precision, double-precision support is optional, whereas fdlibm is double-precision only.
I didn't know that. Maybe that's what the "d" stands for in "fdlibm".
I don't know how compliant current implementations actually are.
I think that there's only one implementation right now, and only available to paid-up Apple developers. Anyway, I'm more interested in what the spec says than current conformance... implementations will gradually become more compliant.
Thanks, Josh
- Bert -