Float Question

Tim Olson tim at jumpnet.com
Thu Feb 4 16:54:39 UTC 1999


>I got most of the Float representation figured out but the following still
>puzzles me as something special seems to be going on with small floats:
>
>            testPrintOn:  Sign    Exponent bits               Mantissa bits
>    ---     ------------  ----    --------------------------  -------------
>    2.0     2e1           +       1 (16r400 2r10000000000)    16r0
>    4.0     4e1           +       2 (16r401 2r10000000001)    16r0
>    8.0     8e1           +       3 (16r402 2r10000000010)    16r0

Float>>testPrintOn: is an old "test" method which should have been 
removed; it was only there to test accuracy of printing the mantissa; the 
exponent is not printed correctly.  Instead, you should use the standard 
printing routines found in the "printing" category:

     printOn:base:
     absPrintExactlyOn:base:

(absPrintExactlyOn:base will print a Float "exactly" such that the 
printed representation, when read back in, matches bit-for-bit the 
original number.  It is quite a bit slower than printOn:base:, though).

The other methods found in Float's "private" category are deprecated, and 
should probably be removed as well.


>    (1 - (self radix raisedTo: self precision negated))
>        * (self radix raisedTo: self emax)                <- added parens
>
>I never fully understood how this calculation worked.  I think it produces
>the maximum fraction multiplied by the maximum exponent.  How it produces
>the maximum fraction has me stumped.

Yes, this is what it does (or attempts to do, anyway): "raisedTo:" is 
somewhat imprecise, and the final result is off by a few bits.  Also, 
looking at the calculation more closely, it's not designed very well.  
(2.0 raisedToInteger: 1024) produces Infinity which ruins the rest of the 
calculation.  I think the best way to do this would be:

(2.0 - (self radix raisedToInteger: (self precision - 1) negated))
     * (self radix raisedToInteger: self emax)

[where emax == 1023, not 1024]

This calculation is "overflow proof", and returns the correct value, down 
to the lsb.  It works by multiplying the largest significand (1.99999...) 
by the largest exponent (2 raisedToInteger: 1023).


>Does Squeak uses denormalized floats, and if so, what indicates a float is
>denormalized and how does one create an instance of a Float that is
>denormalized?

Yes, since Squeak uses IEEE-754 representation for Floats, denorms are 
supported.
You shouldn't normally (no pun intended!) have to know if a Float is 
denormalized or not, unless you are really concerned about detecting 
gradual underflow and loss of precision.

Anyway, a Float is a denorm if it is less than (2.0 raisedToInteger: 
-1023).


>I thought it might be a good idea to include the following description of
>Float representation under an "Implementation" heading in the Float class
>comment.  Let me know if you disagree, or have suggestions to improve to
>format, or any other comments.

I think that simply specifying that the representation complies with 
IEEE-754 is sufficient: people who are interested in the implementation 
details likely know about 754 anyway (which is documented in many places 
on the web), and if all the details are listed, someone might think that 
it is done so because there is some sort of difference between that and 
IEEE-754.



     -- tim





More information about the Squeak-dev mailing list