I know that 511 divided by 512 actually equals 0.998046875. I also know that the precision of floats is 7 digits. My question is, when I do this math in C++ (GCC) the result I get is 0.998047, which is a rounded value. I’d prefer to just get the truncated value of 0.998046, how can I do that?
float a = 511.0f;
float b = 512.0f;
float c = a / b;
Well, here’s one problem. The value of
511/512, as afloat, is exact. No rounding is done. You can check this by asking for more than seven digits:Output:
A
floatis stored not as a decimal number, but binary. If you divide a number by a power of 2, such as 512, the result will almost always be exact. What’s going on is the precision of afloatis not simply 7 digits, it is really 23 bits of precision.See What Every Computer Scientist Should Know About Floating-Point Arithmetic.