Re: [squeak-dev] OpenCL

10 Jan 2009


      Reading the CUDA 2.0 programming guide, I saw an interesting difference 
between the error-bounds for single- and double-precision floating point:
Double-precision
Operation      ULPs
x+y            0 (IEEE-754 round-to-nearest-even)
x*y            0 (IEEE-754 round-to-nearest-even)
x/y            0 (IEEE-754 round-to-nearest-even)
1/x            0 (IEEE-754 round-to-nearest-even)
sqrt(x)        0 (IEEE-754 round-to-nearest-even)
Single-precision
Operation      ULPs
x+y            0 (IEEE-754 round-to-nearest-even)
x*y            0 (IEEE-754 round-to-nearest-even)
x/y            2
1/x            1
sqrt(x)        3
So, there might be some hope, at least for double-precision ops.  
Currently, AFAIK, the latest NVIDIA GPUSs are the only ones with 
double-precision FP support.  But, things are changing rapidly in this 
area: in about a year, Intel will release Larrabee, which will "fully 
support IEEE standards for single and double precision floating-point 
arithmetic".  Hopefully this forces the other vendors to follow suit, 
and future OpenCL revisions reflect this.
Cheers,
Josh
Josh Gargus wrote:
...
Bert Freudenberg wrote:
...
On 10.01.2009, at 11:01, Josh Gargus wrote:
...
As noted by John, Croquet uses fdlibm for bit-identical floating 
point math.  Does anyone have a feeling for how difficult (or 
impossible) it will be to achieve identical computation on 
OpenCL-compliant devices?
The numerical behavior of compliant OpenCL implementations is covered 
in section 7 of the OpenCL spec. In particular, table 7.1 gives the 
error bounds for the various operations. If I interpret that 
correctly, very few functions are guaranteed to behave bit-identical.
Oops, I was skimming by the time I read that part of the spec.  I saw 
that the transcendental functions need not return identical results 
(which was why I mentioned porting fdlibm), but I missed that even x/y 
isn't precisely specified.
...
...
For example, how difficult would it be to port, say, fdlibm, so 
that trancedentals use the exact same code?  Any other show-stoppers 
that might not occur to the naive mind :-)  ?
Well OpenCL only requires single-precision, double-precision support 
is optional, whereas fdlibm is double-precision only.
I didn't know that.  Maybe that's what the "d" stands for in "fdlibm".
...
I don't know how compliant current implementations actually are.
I think that there's only one implementation right now, and only 
available to paid-up Apple developers.  Anyway, I'm more interested in 
what the spec says than current conformance... implementations will 
gradually become more compliant.
Thanks,
Josh
...

Bert -