I was wondering about how bits are organized on floats (4 bytes), double (8

Question

Asked: May 13, 20262026-05-13T17:30:14+00:00 2026-05-13T17:30:14+00:00

I was wondering about how bits are organized on floats (4 bytes), double (8 bytes) and half floats (2 bytes, used on OpenGL implementation).

Further, how I could convert from one to another?

You must login to add an answer.

Need An Account,

Editorial Team · Answer 1 · 2026-05-13T17:30:14+00:00

In essence for each of these formats, you have:

If the sign bit is 1, the number is negative, else it is positive.

To get the magnitude, you take (1 + M) * 2^(E – k), where k (called the “exponent bias”) depends on the format.

It’s worth noting that certain combinations of sign, exponent, and mantissa are “special” values, like 0, -inf, +inf, and NaN.

For the specifics (values of x, y, and k) see Wikipedia for single precision (4 bytes), double precision (8 bytes), and half precision (2 bytes).

Note that these are all specified by IEEE 754, so googling that might give you helpful results. 🙂

The Archive Base Latest Questions