The SSE shift instructions I have found can only shift by the same amount

Question

0

Editorial Team

Asked: June 6, 20262026-06-06T22:09:50+00:00 2026-06-06T22:09:50+00:00

The SSE shift instructions I have found can only shift by the same amount

0

The SSE shift instructions I have found can only shift by the same amount on all the elements:

_mm_sll_epi32()
_mm_slli_epi32()

These shift all elements, but by the same shift amount.

Is there a way to apply different shifts to the different elements? Something like this:

__m128i a,  __m128i b;  

r0:=    a0  <<  b0;
r1:=    a1  <<  b1;
r2:=    a2  <<  b2;
r3:=    a3  <<  b3;

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T22:09:51+00:00

There exists the _mm_shl_epi32() intrinsic that does exactly that.

http://msdn.microsoft.com/en-us/library/gg445138.aspx

However, it requires the XOP instruction set. Only AMD Bulldozer and Interlagos processors or later have this instruction. It is not available on any Intel processor.

If you want to do it without XOP instructions, you will need to do it the hard way: Pull them out and do them one by one.

Without XOP instructions, you can do this with SSE4.1 using the following intrinsics:

_mm_insert_epi32()
_mm_extract_epi32()

http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse41_reg_ins_ext.htm

Those will let you extract parts of a 128-bit register into regular registers to do the shift and put them back.

If you go with the latter method, it’ll be horrifically inefficient. That’s why _mm_shl_epi32() exists in the first place.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

The SSE shift instructions I have found can only shift by the same amount

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply