How do I add and subtract 16 bit floating point half precision numbers?
Say I need to add or subtract:
1 10000 0000000000
1 01111 1111100000
2’s complement form.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Assuming you are using a denormalized representation similar to that of IEEE single/double precision, just compute the sign = (-1)^S, the mantissa as 1.M if E != 0 and 0.M if E == 0, and the exponent = E – 2^(n-1), operate on these natural representations, and convert back to the 16-bit format.
sign1 = -1
mantissa1 = 1.0
exponent1 = 1
sign2 = -1
mantissa2 = 1.11111
exponent2 = 0
sum:
sign = -1
mantissa = 1.111111
exponent = 1
Representation: 1 10000 1111110000
Naturally, this assumes excess encoding of the exponent.