How do I add and subtract 16 bit floating point half precision numbers? Say

Question

0

Editorial Team

Asked: May 25, 20262026-05-25T23:29:36+00:00 2026-05-25T23:29:36+00:00

How do I add and subtract 16 bit floating point half precision numbers? Say

0

How do I add and subtract 16 bit floating point half precision numbers?

Say I need to add or subtract:

1 10000 0000000000

1 01111 1111100000

2’s complement form.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T23:29:36+00:00

Assuming you are using a denormalized representation similar to that of IEEE single/double precision, just compute the sign = (-1)^S, the mantissa as 1.M if E != 0 and 0.M if E == 0, and the exponent = E – 2^(n-1), operate on these natural representations, and convert back to the 16-bit format.

sign1 = -1
mantissa1 = 1.0
exponent1 = 1

sign2 = -1
mantissa2 = 1.11111
exponent2 = 0

sum:
sign = -1
mantissa = 1.111111
exponent = 1

Representation: 1 10000 1111110000

Naturally, this assumes excess encoding of the exponent.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How do I add and subtract 16 bit floating point half precision numbers? Say

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply