I’m really curious about how Double Precision Floating point number is stored.
These are things I figured out so far.
- They require 64 bits in memory
- Consist of three parts
- Sign bit (1 bit long)
- Exponent (11 bit long)
- Fraction (53 bits, the first bit is assumed always to be 1, thus only 52 are stored, except when all 52 bits are 0. Then leading bit is assumed to be 0)
However I do not uderstand what is exponent, exponent bias and all those formulas in wikipedia page.
Can anyone explain me what are all those things, how they work and eventually calculated to the real number step by step?
Check out the formula a little further down the page:
Except for the above exceptions, the entire double-precision number is described by:
(-1)^sign * 2^(exponent – bias) * 1.mantissa
The formula means that for non-NAN, non-INF, non-zero and non-denormal numbers (which I’ll ignore) you take the bits in the mantissa and add an implicit 1 bit at the top. This makes the mantissa 53 bits in the range 1.0 … 1.111111…11 (binary). To get the actual value, you multiply the mantissa by the 2 to the power of the exponent minus the bias (1023) and either negate the result or not depending on the sign bit. The number 1.0 would have an unbiased exponent of zero (i.e. 1.0 = 1.0 * 2^0) and its biased exponent would be 1023 (the bias is just added to the exponent). So, 1.0 would be sign = 1, exponent = 1023, mantissa = 0 (remember the hidden mantissa bit).
Putting it all together in hexadecimal the value would be 0x3FF000000000 == 1.0.