Float Question
R. A. Harmon
harmonra at webname.com
Thu Feb 4 15:51:22 UTC 1999
I finally got back to this. Again, thanks to all for the terrific explanations.
I got most of the Float representation figured out but the following still
puzzles me as something special seems to be going on with small floats:
testPrintOn: Sign Exponent bits Mantissa bits
--- ------------ ---- -------------------------- -------------
2.0 2e1 + 1 (16r400 2r10000000000) 16r0
4.0 4e1 + 2 (16r401 2r10000000001) 16r0
8.0 8e1 + 3 (16r402 2r10000000010) 16r0
Thanks to Tim Olson's response that the following calculation from the
proposed standard doc. correctly produces #fmax:
radix is 2
precision is 53 (52 bit normalized mantissa + assumed high order 1 bit)
emax is 1024 (11 bit exponent)
(1 - (self radix raisedTo: self precision negated))
* (self radix raisedTo: self emax) <- added parens
I never fully understood how this calculation worked. I think it produces
the maximum fraction multiplied by the maximum exponent. How it produces
the maximum fraction has me stumped.
About the only thing I need now to finish up the <floatCharacterization>
protocol is whether Squeak uses denormalized floats. Tim's response, and
methods:
Float 'mathematical functions' #reciprocalFloorLog:
Fraction 'converting' #asFloat
mention of "denormalized" seems to indicate that it does, but I don't see
anyway to tell that a float is denormalized, or how to create one.
Does Squeak uses denormalized floats, and if so, what indicates a float is
denormalized and how does one create an instance of a Float that is
denormalized?
I thought it might be a good idea to include the following description of
Float representation under an "Implementation" heading in the Float class
comment. Let me know if you disagree, or have suggestions to improve to
format, or any other comments.
Implementation"
Float representation:
1 bit - sign range: 1 or 0
0 is positive and 1 is negative.
11 bits - signed exponent range: -1023 through +1024
value of bits minus 1023.
52 bits - normalized mantissa range: 0 through 4803839602528529
value of bits plus an assumed high order 1 bit.
The following are reserved values:
Reserved Sign Exponent Mantissa Accessing
Class var. name raw bits raw bits
------------------- ---- --------------- --------------
--------------------
* (zero) + -1023 (16r0) 0 0.0
NegativeZero - " " 0 Float
negativeZero
Infinity + 1024 (16r7FF) 0 Float infinity
NegativeInfinity - " " 0 0.0 - Float
infinity
NaN + 1024 (16r7FF) 16r8000000000000 Float nan
* (denormalized) + 1024 (16r7FF) not 0 -
The following are a couple of special values:
Special Sign Exponent Mantissa Accessing (ANSI)
------------ ----- -------------- --------------------
-----------------
Epsilon + -40 (16r3D7) 16r19799812DEA11 Float epsilon
MaxVal + 1023 (16r7FE) 16rFFFFFFFFFFFFF Float fmax
Notes:
Largest value that can be stored in n bits is:
((2 raisedTo: n) - 1).
Exponent offset is:
((Largest value - sign bit) / 2) negated
Largest signed exponent that can be stored in n bits is:
(Largest value) - (Exponent offset)
At 09:49 AM 1/3/99 -0600, Tim Olson wrote:
>Richard Harmon writes:
>
>[exponent field]
>> 11 bits - value minus 1023 (16r3FF) to produce an exponent
>> of -1022 (-16r3FE) through +1023 (16r3FF)
>> - 16r000 reserved for Float zero (mantissa is ignored)
>> - 16r7FF reserved for Float underflow/overflow (mantissa
>>is ignored)
>
>These are not quite right; they should be:
>
> 11 bits - value minus 1023 to produce an exponent in the range
> -1023 .. +1024
>
> - 16r000:
> significand = 0: Float zero
> significand ~= 0: Denormalized number
> (exp = -1024, no hidden '1' bit)
>
> - 16r7FF:
> significand = 0: Infinity
> significand ~= 0: Not A Number (NaN)
>representation
>
>> fmax Definition: Report the largest value allowed by the
>>characterized floating point object representation. This satisfies the
>>ISO/IEC 10967 floating point characterization requirement fmax, and is
>>equal to:
[snip]
> (1 - (self radix raisedTo: self precision negated)) *
> (self radix raisedTo: self emax)
>
>
>> precision (52)
>
>[should be 53, there are 53 bits of precision including the hidden '1'
>bit]
>
>
>> emax (1023)
>
>[should be 1024]
>
>
>With the correct values, I get:
>
>((1 - (2 raisedTo: 53 negated)) * (2 raisedTo: 1024)) asFloat
>
> => 1.797693134862315e308
--
Richard A. Harmon "The only good zombie is a dead zombie"
harmonra at webname.com E. G. McCarthy
More information about the Squeak-dev
mailing list
|