I’m using VS2005 (at work) and need an SSE intrinsic that does the following:
I have a pre-existing __m128i n filled with 16 bit integers a_1,a_2,....,a_8.
Since some calculations that I now want to do require 32 instead of 16 bits, I want to extract the two four-sets of 16-bit integers from n and put them into two separated __m128is which contain a_1,...,a_4 and a_5,...,a_8 respectively.
I could do this manually using the various _mm_set intrinsics, but those would result in eight movs in assembly, and I’d hoped that there would be a faster way to do this.
Assuming that I understand correctly what it that you want to achieve (unpack 8 x 16 bits in one vector into two vectors of 4 x 32 bit ints), I typically do it like this in SSE2 and later: