I’m writing transpose function for 8x16bit vectors with SSE2 intrinsics. Since there are 8 arguments for that function (a matrix of 8x8x16bit size), I can’t do anything but pass them by reference. Will that be optimized by the compiler (I mean, will these __m128i objects be passed in registers instead of stack)?
Code snippet:
inline void transpose (__m128i &a0, __m128i &a1, __m128i &a2, __m128i &a3,
__m128i &a4, __m128i &a5, __m128i &a6, __m128i &a7) {
....
}
Chances are that they will not be pushed to the stack. If the function is inline the compiler will actually push the operations (code) from the called function into the callee function instead of passing the data from the caller to the callee.
Now, inline is a hint, so the compiler can decide not to actually inline the call and then you would have to follow Zan’s advice and actually check what the compiled code looks like.