Why is the result of this explicit cast different from the implicit one?
#include <stdio.h> double a; double b; double c; long d; double e; int main() { a = 1.0; b = 2.0; c = .1; d = (b - a + c) / c; printf('%li\n', d); // 10 e = (b - a + c) / c; d = (long) e; printf('%li\n', d); // 11 }
If I do d = (long) ((b – a + c) / c); I also get 10. Why does the assignment to a double make a difference?
I suspect the difference is a conversion from an 80-bit floating point value to a long vs a conversion from an 80-bit floating point value to a 64-bit one and then a conversion to a long.
(The reason for 80 bits coming up at all is that that’s a typical precision used for actual arithmetic, and the width of floating point registers.)
Suppose the 80-bit result is something like 10.999999999999999 – the conversion from that to a long yields 10. However, the nearest 64-bit floating point value to the 80-bit value is actually 11.0, so the two-stage conversion ends up yielding 11.
EDIT: To give this a bit more weight…
Here’s a Java program which uses arbitrary-precision arithmetic to do the same calculation. Note that it converts the double value closest to 0.1 into a BigDecimal – that value is 0.1000000000000000055511151231257827021181583404541015625. (In other words, the exact result of the calculation is not 11 anyway.)
Here’s the result:
In other words, that’s correct to about 40 decimal digits (way more than either 64 or 80 bit floating point can handle).
Now, let’s consider what this number looks like in binary. I don’t have any tools to easily do the conversion, but again we can use Java to help. Assuming a normalised number, the ’10’ part ends up using three bits (one less than for eleven = 1011). That leaves 60 bits of mantissa for extended precision (80 bits) and 48 bits for double precision (64 bits).
So, what’s the closest number to 11 in each precision? Again, let’s use Java:
Results:
So, the three numbers we’ve got are:
Now work out the closest value to the correct one for each precision – for extended precision, it’s less than 11. Round each of those values to a long, and you end up with 10 and 11 respectively.
Hopefully this is enough evidence to convince the doubters 😉