There are certain int values that a float can not represent.
However, can a double represent all values a float can represent?
My intuition says yes, since double has more fractional bits & more exponent bits, but there might be some silly gotchas that I’m missing.
Yes.
It would probably help to know how floats and doubles work.
Without going too much into details…
Take the number
152853.5047( the revolution period of Jupiter’s moon Io in seconds )In scientific notation, this number is
0.1528535047 × 10^6Since computers only understand 1 and 0, there is way to define
.The mantissa (1528535047) and the exponent (6) are stored within 32-bits… if I remember correctly, only 24-bits are for the mantissa, so floating point is usually more about precision than size. The larger the number, the less precise it can be.
1528535047 =
1011011000110111001100000000111so you can only store the first 24-bits… the last three 1’s are lopped off.Since Integers are 32-bits, you’re right, a floating point can’t accurately contain it. less significant digits get lopped off the end.
Any integer with an absolute value of less than 2^24 ( 24-bits )can be stored without losing precision. (16,777,216)
This is how the bits are stored in a floating point number:
How floats are stores diagram http://phimuemue.wordpress.com/files/2009/06/576px-ieee-754-single-svg1.png
source
One bit for the sign, 8-bits for the exponent and 23-bits for the mantissa. Therefore, to answer your question, since only 23-bits are reserved for the mantissa, a 32-bit integer can’t be showed with precision. It will quickly start lopping off numbers ( from the right ) as there are more digits needed to display.
For a double, you’re merely increasing the number of bits that it can store… in fact, it’s called double precision so any number that can be shown as a float is capable of being shown as a double. Extra 0’s are merely added to the mantissa.
For this reason, since a double takes up 64-bits, most people will use a double when converting from a 32-bit int to a double. A float would be good for converting a 16-bit short.