I want to convert a floating point value to a 16-bit unsigned integer without saturating (wraparound/overflow instead).
#include <iostream>
#include <xmmintrin.h>
void satur_wrap()
{
const float bigVal = 99000.f;
const __m128 bigValVec = _mm_set1_ps(bigVal);
const __m64 outVec64 =_mm_cvtps_pi16(bigValVec);
#if 0
const __m128i outVec = _mm_movpi64_epi64(outVec64);
#else
#if 1
const __m128i outVec = _mm_packs_epi32(_mm_cvttps_epi32(bigValVec), _mm_cvttps_epi32(bigValVec));
#else
const __m128i outVec = _mm_cvttps_epi32(bigValVec);
#endif
#endif
uint16_t *outVals = NULL;
posix_memalign((void **) &outVals, sizeof(__m128i), sizeof(__m128i));
_mm_store_si128(reinterpret_cast<__m128i *>(outVals), outVec);
for (int i = 0; i < sizeof(outVec) / sizeof(*outVals); i++)
{
std::cout << "outVals[" << i << "]: " << outVals[i] << std::endl;
}
std::cout << std::endl
<< "\tbigVal: " << bigVal << std::endl
<< "\t(unsigned short) bigVal: " << ((unsigned short) bigVal) << std::endl
<< "\t((unsigned short)((int) bigVal)): " << ((unsigned short)((int) bigVal)) << std::endl
<< std::endl;
}
Sample execution:
$ ./row
outVals[0]: 32767
outVals[1]: 32767
outVals[2]: 32767
outVals[3]: 32767
outVals[4]: 32767
outVals[5]: 32767
outVals[6]: 32767
outVals[7]: 32767
bigVal: 99000
(unsigned short) bigVal: 65535
((unsigned short)((int) bigVal)): 33464
The ((unsigned short)((int) bigVal)) expression works as desired (but it’s probably UB, right?). But I can’t find something quite similar with SSE. I must be missing something, but I couldn’t find a primitive to convert four 32-bit floats to four 32-bit ints.
EDIT: Oops, I figured it would be “normal” for 32-bit integer -> 16-bit unsigned integer conversion to use wraparound. But I’ve since learned that _mm_packs_epi32 uses signed-saturate (and there doesn’t appear to be a _mm_packus_epi32). Is there a way to set the mode, or another primitive besides _mm_packus_epi32?
I’m answering only part of the question concerning 32-bit integer -> 16-bit unsigned integer conversion.
Since you need a wraparound, just take the low-order word of each double-word containing 32-bit integer. These 16-bit integers are interleaved with 16-bit pieces of unused data, so it may be convenient to pack them into a contiguous array. The easiest way to do this is using
_mm_shuffle_epi8intrinsic (SSSE3).If you want your program to be more portable and require only SSE2 instruction set, you can pack the values with
_mm_packs_epi32, but disable its saturating behavior with following trick:This trick works because it performs sign extension of 16-bit values, which makes signed saturation a no-op.
The same trick works with
_mm_packus_epi32:This trick works because it performs zero extension of 16-bit values, which makes unsigned saturation a no-op. It is easier to perform zero extension, but you need SSE4.1 instruction set to make
_mm_packus_epi32available.It is possible to pack 8 16-bit integers using a single instruction:
_mm_perm_epi8. But this requires pretty rare XOP instruction set.And here are several words about saturated conversion.
In fact
_mm_packus_epi32intrinsic is available if you change#include <xmmintrin.h>to#include <smmintrin.h>or#include <x86intrin.h>. You need both your CPU and compiler to support SSE4.1 extensions.If you have no SSE4.1-compatible CPU or compiler or want your program to be more portable, substitute
_mm_packus_epi32intrinsic with code like this: