x86 does not have an SSE instruction to convert from unsigned int32 to floating

Question

0

Asked: May 29, 20262026-05-29T10:01:55+00:00 2026-05-29T10:01:55+00:00

x86 does not have an SSE instruction to convert from unsigned int32 to floating

0

x86 does not have an SSE instruction to convert from unsigned int32 to floating point. What would be the most efficient instruction sequence for achieving this?

EDIT:
To clarify, i want to do the vector sequence of the following scalar operation:

unsigned int x = ...
float res = (float)x;

EDIT2: Here is a naive algorithm for doing a scalar conversion.

unsigned int x = ...
float bias = 0.f;
if (x > 0x7fffffff) {
    bias = (float)0x80000000;
    x -= 0x80000000;
}
res = signed_convert(x) + bias;

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T10:01:56+00:00

Your naive scalar algorithm doesn’t deliver a correctly-rounded conversion — it will suffer from double rounding on certain inputs. As an example: if x is 0x88000081, then the correctly-rounded result of conversion to float is 2281701632.0f, but your scalar algorithm will return 2281701376.0f instead.

Off the top of my head, you can do a correct conversion as follows (as I said, this is off the top of my head, so it’s likely possible to save an instruction somewhere):

movdqa   xmm1,  xmm0    // make a copy of x
psrld    xmm0,  16      // high 16 bits of x
pand     xmm1, [mask]   // low 16 bits of x
orps     xmm0, [onep39] // float(2^39 + high 16 bits of x)
cvtdq2ps xmm1, xmm1     // float(low 16 bits of x)
subps    xmm0, [onep39] // float(high 16 bits of x)
addps    xmm0,  xmm1    // float(x)

where the constants have the following values:

mask:   0000ffff 0000ffff 0000ffff 0000ffff
onep39: 53000000 53000000 53000000 53000000

What this does is separately convert the high- and low-halves of each lane to floating-point, then add these converted values together. Because each half is only 16 bits wide, the conversion to float does not incur any rounding. Rounding only occurs when the two halves are added; because addition is a correctly-rounded operation, the entire conversion is correctly rounded.

By contrast, your naive implementation first converts the low 31 bits to float, which incurs a rounding, then conditionally adds 2^31 to that result, which may cause a second rounding. Any time you have two separate rounding points in a conversion, unless you are exceedingly careful about how they occur, you should not expect the result to be correctly rounded.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

x86 does not have an SSE instruction to convert from unsigned int32 to floating

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply