Float Question

R. A. Harmon harmonra at webname.com
Thu Feb 4 15:51:22 UTC 1999

```I finally got back to this.  Again, thanks to all for the terrific explanations.

I got most of the Float representation figured out but the following still
puzzles me as something special seems to be going on with small floats:

testPrintOn:  Sign    Exponent bits               Mantissa bits
---     ------------  ----    --------------------------  -------------
2.0     2e1           +       1 (16r400 2r10000000000)    16r0
4.0     4e1           +       2 (16r401 2r10000000001)    16r0
8.0     8e1           +       3 (16r402 2r10000000010)    16r0

Thanks to Tim Olson's response that the following calculation from the
proposed standard doc. correctly produces #fmax:

precision is 53  (52 bit normalized mantissa +  assumed high order 1 bit)
emax is 1024     (11 bit exponent)

(1 - (self radix raisedTo: self precision negated))

I never fully understood how this calculation worked.  I think it produces
the maximum fraction multiplied by the maximum exponent.  How it produces
the maximum fraction has me stumped.

About the only thing I need now to finish up the <floatCharacterization>
protocol is whether Squeak uses denormalized floats.  Tim's response, and
methods:

Float 'mathematical functions' #reciprocalFloorLog:
Fraction 'converting' #asFloat

mention of "denormalized" seems to indicate that it does, but I don't see
anyway to tell that a float is denormalized, or how to create one.

Does Squeak uses denormalized floats, and if so, what indicates a float is
denormalized and how does one create an instance of a Float that is
denormalized?

I thought it might be a good idea to include the following description of
Float representation under an "Implementation" heading in the Float class
comment.  Let me know if you disagree, or have suggestions to improve to

Implementation"

Float representation:

1 bit     - sign range: 1 or 0
0 is positive and 1 is negative.
11 bits    - signed exponent range: -1023 through +1024
value of bits minus 1023.
52 bits    - normalized mantissa range: 0 through 4803839602528529
value of bits plus an assumed high order 1 bit.

The following are reserved values:

Reserved            Sign   Exponent       Mantissa           Accessing
Class var. name               raw bits       raw bits
------------------- ----  ---------------  --------------
--------------------
* (zero)            +     -1023 (16r0)     0                 0.0
NegativeZero        -       "      "       0                 Float
negativeZero
Infinity            +     1024 (16r7FF)    0                 Float infinity
NegativeInfinity    -       "      "       0                 0.0 - Float
infinity
NaN                 +     1024 (16r7FF)    16r8000000000000  Float nan
* (denormalized)    +     1024 (16r7FF)    not 0        -

The following are a couple of special values:

Special      Sign  Exponent          Mantissa               Accessing (ANSI)
------------ ----- --------------    --------------------
-----------------
Epsilon      +     -40 (16r3D7)      16r19799812DEA11       Float epsilon
MaxVal      +      1023 (16r7FE)     16rFFFFFFFFFFFFF       Float fmax

Notes:

Largest value that can be stored in n bits is:

((2 raisedTo: n) - 1).

Exponent offset is:

((Largest value - sign bit) / 2) negated

Largest signed exponent that can be stored in n bits is:

(Largest value) - (Exponent offset)

At 09:49 AM 1/3/99 -0600, Tim Olson wrote:
>Richard Harmon writes:
>
>[exponent field]
>>        11 bits     - value minus 1023 (16r3FF) to produce an exponent
>>                        of -1022 (-16r3FE) through +1023 (16r3FF)
>>                    - 16r000 reserved for Float zero (mantissa is ignored)
>>                    - 16r7FF reserved for Float underflow/overflow (mantissa
>>is ignored)
>
>These are not quite right; they should be:
>
>          11 bits   - value minus 1023 to produce an exponent in the range
>                         -1023 .. +1024
>
>                    - 16r000:
>                         significand = 0: Float zero
>                         significand ~= 0: Denormalized number
>                              (exp = -1024, no hidden '1' bit)
>
>                    - 16r7FF:
>                         significand = 0: Infinity
>                         significand ~= 0: Not A Number (NaN)
>representation
>
>>        fmax Definition: Report the largest value allowed by the
>>characterized floating point object representation. This satisfies the
>>ISO/IEC 10967 floating point characterization requirement fmax, and is
>>equal to:
[snip]
>     (1 - (self radix raisedTo: self precision negated)) *
>          (self radix raisedTo: self emax)
>
>
>>        precision (52)
>
>[should be 53, there are 53 bits of precision including the hidden '1'
>bit]
>
>
>>        emax (1023)
>
>[should be 1024]
>
>
>With the correct values, I get:
>
>((1 - (2 raisedTo: 53 negated)) * (2 raisedTo: 1024)) asFloat
>
>     => 1.797693134862315e308
--
Richard A. Harmon          "The only good zombie is a dead zombie"
harmonra at webname.com           E. G. McCarthy

```