[squeak-dev] OpenCL

Josh Gargus schwa at fastmail.us
Sat Jan 10 20:30:13 UTC 2009

Reading the CUDA 2.0 programming guide, I saw an interesting difference 
between the error-bounds for single- and double-precision floating point:

Operation      ULPs
x+y            0 (IEEE-754 round-to-nearest-even)
x*y            0 (IEEE-754 round-to-nearest-even)
x/y            0 (IEEE-754 round-to-nearest-even)
1/x            0 (IEEE-754 round-to-nearest-even)
sqrt(x)        0 (IEEE-754 round-to-nearest-even)

Operation      ULPs
x+y            0 (IEEE-754 round-to-nearest-even)
x*y            0 (IEEE-754 round-to-nearest-even)
x/y            2
1/x            1
sqrt(x)        3

So, there might be some hope, at least for double-precision ops.  
Currently, AFAIK, the latest NVIDIA GPUSs are the only ones with 
double-precision FP support.  But, things are changing rapidly in this 
area: in about a year, Intel will release Larrabee, which will "fully 
support IEEE standards for single and double precision floating-point 
arithmetic".  Hopefully this forces the other vendors to follow suit, 
and future OpenCL revisions reflect this.



Josh Gargus wrote:
> Bert Freudenberg wrote:
>> On 10.01.2009, at 11:01, Josh Gargus wrote:
>>> As noted by John, Croquet uses fdlibm for bit-identical floating 
>>> point math.  Does anyone have a feeling for how difficult (or 
>>> impossible) it will be to achieve identical computation on 
>>> OpenCL-compliant devices?
>> The numerical behavior of compliant OpenCL implementations is covered 
>> in section 7 of the OpenCL spec. In particular, table 7.1 gives the 
>> error bounds for the various operations. If I interpret that 
>> correctly, very few functions are guaranteed to behave bit-identical.
> Oops, I was skimming by the time I read that part of the spec.  I saw 
> that the transcendental functions need not return identical results 
> (which was why I mentioned porting fdlibm), but I missed that even x/y 
> isn't precisely specified.
>>>   For example, how difficult would it be to port, say, fdlibm, so 
>>> that trancedentals use the exact same code?  Any other show-stoppers 
>>> that might not occur to the naive mind :-)  ?
>> Well OpenCL only requires single-precision, double-precision support 
>> is optional, whereas fdlibm is double-precision only. 
> I didn't know that.  Maybe that's what the "d" stands for in "fdlibm".
>> I don't know how compliant current implementations actually are.
> I think that there's only one implementation right now, and only 
> available to paid-up Apple developers.  Anyway, I'm more interested in 
> what the spec says than current conformance... implementations will 
> gradually become more compliant.
> Thanks,
> Josh
>> - Bert -

More information about the Squeak-dev mailing list