int main(int argc, char *argv[])
{
uint64_t length = 0x4f56aa5d4b2d8a80;
uint64_t new_length = 0;
new_length = length + 119.000000;
printf("new length 0x%"PRIx64"\n",new_length);
new_length = length + 238.000000;
printf("new length 0x%"PRIx64"\n",new_length);
return 0;
}
With the above code. I am adding two different double values to a unsigned 64-bit integer.I am getting the exact same result in both the cases.The output of the program is show below
$./a.out
new length 0x4f56aa5d4b2d8c00
new length 0x4f56aa5d4b2d8c00
I would expect two different results but that is not the case.I have also tried type-casting the uint64_t value to a double as in
new_length = (double)length + 119.000000;
But this too doesn’t seem to help.Any idea on what might be the problem?
Floating point arithmetic is not precise. As numbers get bigger, the accuracy of lower digits is reduced.
0x4f56aa5d4b2d8a80 is a Very Large Number.
What is happening in
Is that
length + 119.000000is getting cast to a double, to do the addition. That double is rounded, rather dramatically, because it’s so large. It is then cast again to the integral type uint64_t when it is assigned tonew_length.When you call
It happens that the rounded result ends up being the same.
What you really want to do is
That will give you the answer you want. It will initially cast the double to an integral type, which is added precisely.