I have a question which arose from another question about precision of floating numbers.
Now, I know that floating points can not always be represented accurately and hence they are stored as the closest possible floating number that can be represented.
My question is actually about the difference in representation of float and double.
Where does this question arise from?
Suppose I do:
System.out.println(.475d+.075d);
then the output would not be 0.55 but 0.549999 (on my machine)
However, when I do :
System.out.println(.475f+.075f);
I get the correct answer, i.e. 0.55 (a little unexpected for me)
Till now I was under an impression that double has more precision(double will be more accurate upto a longer number of decimal places) that float. So, if a double cannot be represented precisely, then its equivalent float representation will also be stored inaccurately.
However the results I got are a little disturbing for me. I am confused if:
- I have an incorrect understanding of what
precisionmeans? floatanddoubleare represented differently, apart from the fact that double has more bits?
Precision just means more bits. A number that cannot be represented as a
floatmay have an exact representation as adouble, but that the number of those cases is infinitely small relative to the total number of possible cases.For the simple cases like
0.1, that is not representable as a fixed-length floating-point number, no matter what the number of bits available. This is the same as saying that a fraction such as 1/7 cannot be represented exactly in decimal, regardless of the number of digits you are allowed to use (as long as the number of digits is finite). You can approximate it as 0.142857142857142857… repeating over and over again, but you will never be able to write it EXACTLY no matter how long you go on.Conversely, if a number is representable exactly as a
float, it will also be representable exactly as adouble. A double has a larger exponent range and more mantissa bits.For your example, the cause of the apparent discrepancy is that in
float, the difference between 0.475 and its float representation was in the ‘right’ direction so that when truncation occurred it went how you expected it. When increasing the precision available, the representation was “closer” to 0.475 but now on the opposite side. As a gross example, let’s say that the closest possible float was 0.475006 but in a double the closest possible value was 0.474999. This would give you the results you see.Edit: Here’s the results of a quick experiment:
Output:
What this means is that the floating-point representation of the number 0.475, if you had a huge number of bits, would be just a tiny bit less than 0.475. This is see in the double representation. However, the first ‘wrong’ bit occurs so far to the right that when truncated to fit in a
float, it just happens to work out to 0.475. This is purely an accident.