Possible Duplicate: Floating point comparison I have a problem about the accuracy of float

Question

0

Editorial Team

Asked: June 12, 20262026-06-12T01:03:55+00:00 2026-06-12T01:03:55+00:00

Possible Duplicate: Floating point comparison I have a problem about the accuracy of float

0

Possible Duplicate:
Floating point comparison

I have a problem about the accuracy of float in C/C++. When I execute the program below:

#include <stdio.h>

int main (void) {
    float a = 101.1;
    double b = 101.1;
    printf ("a: %f\n", a);
    printf ("b: %lf\n", b);
    return 0;
}

Result:

a: 101.099998
b: 101.100000

I believe float should have 32-bit so should be enough to store 101.1 Why?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T01:03:56+00:00

You can only represent numbers exactly in IEEE754 (at least for the single and double precision binary formats) if they can be constructed from adding together inverted powers of two (i.e., 2^-n like 1, 1/2, 1/4, 1/65536 and so on) subject to the number of bits available for precision.

There is no combination of inverted powers of two that will get you exactly to 101.1, within the scaling provided by floats (23 bits of precision) or doubles (52 bits of precision).

If you want a quick tutorial on how this inverted-power-of-two stuff works, see this answer.

Applying the knowledge from that answer to your 101.1 number (as a single precision float):

s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm    1/n
0 10000101 10010100011001100110011
           |  | |   ||  ||  ||  |+- 8388608
           |  | |   ||  ||  ||  +-- 4194304
           |  | |   ||  ||  |+-----  524288
           |  | |   ||  ||  +------  262144
           |  | |   ||  |+---------   32768
           |  | |   ||  +----------   16384
           |  | |   |+-------------    2048
           |  | |   +--------------    1024
           |  | +------------------      64
           |  +--------------------      16
           +-----------------------       2

The mantissa part of that actually continues forever for 101.1:

mmmmmmmmm mmmm mmmm mmmm mm
100101000 1100 1100 1100 11|00 1100 (and so on).

hence it’s not a matter of precision, no amount of finite bits will represent that number exactly in IEEE754 format.

Using the bits to calculate the actual number (closest approximation), the sign is positive. The exponent is 128+4+1 = 133 – 127 bias = 6, so the multiplier is 2⁶ or 64.

The mantissa consists of 1 (the implicit base) plus (for all those bits with each being worth 1/(2ⁿ) as n starts at 1 and increases to the right), {1/2, 1/16, 1/64, 1/1024, 1/2048, 1/16384, 1/32768, 1/262144, 1/524288, 1/4194304, 1/8388608}.

When you add all these up, you get 1.57968747615814208984375.

When you multiply that by the multiplier previously calculated, 64, you get 101.09999847412109375.

All numbers were calculated with bc using a scale of 100 decimal digits, resulting in a lot of trailing zeros, so the numbers should be very accurate. Doubly so, since I checked the result with:

#include <stdio.h>
int main (void) {
    float f = 101.1f;
    printf ("%.50f\n", f);
    return 0;
}

which also gave me 101.09999847412109375000....

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Possible Duplicate: Floating point comparison I have a problem about the accuracy of float

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply