I was wondering about how bits are organized on floats (4 bytes), double (8 bytes) and half floats (2 bytes, used on OpenGL implementation).
Further, how I could convert from one to another?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
In essence for each of these formats, you have:
If the sign bit is 1, the number is negative, else it is positive.
To get the magnitude, you take (1 + M) * 2^(E – k), where k (called the “exponent bias”) depends on the format.
It’s worth noting that certain combinations of sign, exponent, and mantissa are “special” values, like 0,
-inf,+inf, andNaN.For the specifics (values of x, y, and k) see Wikipedia for single precision (4 bytes), double precision (8 bytes), and half precision (2 bytes).
Note that these are all specified by IEEE 754, so googling that might give you helpful results. 🙂