The code i want to optimize is basically a simple but large arithmetic formula, it should be fairly simple to analyze the code automatically to compute the independent multiplications/additions in parallel, but i read that autovectorization only works for loops.
I’ve read multiple times now that access of single elements in a vector via union or some other way should be avoided at all costs, instead should be replaced by a _mm_shuffle_pd (i’m working on doubles only)…
I don’t seem to figure out how I can store the content of a __m128d vector as doubles without accessing it as a union. Also, does an operation like this give any performance gain when compared to scalar code?
union {
__m128d v;
double d[2];
} vec;
union {
__m128d v;
double d[2];
} vec2;
vec.v = index1;
vec2.v = index2;
temp1 = _mm_mul_pd(temp1, _mm_set_pd(bvec[vec.d[1]], bvec[vec2[1]]));
also, the two unions look ridiculously ugly, but when using
union dvec {
__m128d v;
double d[2];
} vec;
Trying to declare the indexX as dvec, the compiler complained dvec is undeclared.
Unfortunately if you look at MSDN it says the following:
I’m no expert in SIMD, however this tells me that what you’re doing won’t work as it’s just not designed to.
EDIT:
I’ve just found this, and it says:
It also says:
So maybe you can use them, but only in unions. Seems contradictory to what MSDN says, however.
EDIT2:
Here is another interesting resource that describes with examples on how to use these SIMD types
In the above link, you’ll find this line: