I know a little bit about how floating-point numbers are represented, but not enough,

Question

0

Editorial Team

Asked: May 11, 20262026-05-11T19:47:56+00:00 2026-05-11T19:47:56+00:00

I know a little bit about how floating-point numbers are represented, but not enough,

0

I know a little bit about how floating-point numbers are represented, but not enough, I’m afraid.

The general question is:

For a given precision (for my purposes, the number of accurate decimal places in base 10), what range of numbers can be represented for 16-, 32- and 64-bit IEEE-754 systems?

Specifically, I’m only interested in the range of 16-bit and 32-bit numbers accurate to +/-0.5 (the ones place) or +/- 0.0005 (the thousandths place).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-11T19:47:57+00:00

For a given IEEE-754 floating point number X, if

2^E <= abs(X) < 2^(E+1)

then the distance from X to the next largest representable floating point number (epsilon) is:

epsilon = 2^(E-52)    % For a 64-bit float (double precision)
epsilon = 2^(E-23)    % For a 32-bit float (single precision)
epsilon = 2^(E-10)    % For a 16-bit float (half precision)

The above equations allow us to compute the following:

For half precision…

If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^10. Any X larger than this limit leads to the distance between floating point numbers greater than 0.5.

If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 1. Any X larger than this maximum limit leads to the distance between floating point numbers greater than 0.0005.
For single precision…

If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^23. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.5.

If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^13. Any X larger than this lmit leads to the distance between floating point numbers being greater than 0.0005.
For double precision…

If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^52. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.5.

If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^42. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.0005.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I know a little bit about how floating-point numbers are represented, but not enough,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply