[squeak-dev] OpenCL

Sat Jan 10 20:30:13 UTC 2009

Reading the CUDA 2.0 programming guide, I saw an interesting difference 
between the error-bounds for single- and double-precision floating point:

Double-precision
Operation      ULPs
x+y            0 (IEEE-754 round-to-nearest-even)
x*y            0 (IEEE-754 round-to-nearest-even)
x/y            0 (IEEE-754 round-to-nearest-even)
1/x            0 (IEEE-754 round-to-nearest-even)
sqrt(x)        0 (IEEE-754 round-to-nearest-even)

Single-precision
Operation      ULPs
x+y            0 (IEEE-754 round-to-nearest-even)
x*y            0 (IEEE-754 round-to-nearest-even)
x/y            2
1/x            1
sqrt(x)        3

So, there might be some hope, at least for double-precision ops.  
Currently, AFAIK, the latest NVIDIA GPUSs are the only ones with 
double-precision FP support.  But, things are changing rapidly in this 
area: in about a year, Intel will release Larrabee, which will "fully 
support IEEE standards for single and double precision floating-point 
arithmetic".  Hopefully this forces the other vendors to follow suit, 
and future OpenCL revisions reflect this.

Cheers,
Josh

Josh Gargus wrote:
> Bert Freudenberg wrote:
>> On 10.01.2009, at 11:01, Josh Gargus wrote:
>>> As noted by John, Croquet uses fdlibm for bit-identical floating 
>>> point math.  Does anyone have a feeling for how difficult (or 
>>> impossible) it will be to achieve identical computation on 
>>> OpenCL-compliant devices?
>>
>> The numerical behavior of compliant OpenCL implementations is covered 
>> in section 7 of the OpenCL spec. In particular, table 7.1 gives the 
>> error bounds for the various operations. If I interpret that 
>> correctly, very few functions are guaranteed to behave bit-identical.
>
> Oops, I was skimming by the time I read that part of the spec.  I saw 
> that the transcendental functions need not return identical results 
> (which was why I mentioned porting fdlibm), but I missed that even x/y 
> isn't precisely specified.
>
>>
>>>   For example, how difficult would it be to port, say, fdlibm, so 
>>> that trancedentals use the exact same code?  Any other show-stoppers 
>>> that might not occur to the naive mind :-)  ?
>>
>> Well OpenCL only requires single-precision, double-precision support 
>> is optional, whereas fdlibm is double-precision only. 
>
> I didn't know that.  Maybe that's what the "d" stands for in "fdlibm".
>
>> I don't know how compliant current implementations actually are.
>
> I think that there's only one implementation right now, and only 
> available to paid-up Apple developers.  Anyway, I'm more interested in 
> what the spec says than current conformance... implementations will 
> gradually become more compliant.
>
> Thanks,
> Josh
>
>
>> - Bert -
>>
>>
>>
>
>