The code i want to optimize is basically a simple but large arithmetic formula,

Question

0

Asked: June 11, 20262026-06-11T13:58:37+00:00 2026-06-11T13:58:37+00:00

The code i want to optimize is basically a simple but large arithmetic formula,

0

The code i want to optimize is basically a simple but large arithmetic formula, it should be fairly simple to analyze the code automatically to compute the independent multiplications/additions in parallel, but i read that autovectorization only works for loops.

I’ve read multiple times now that access of single elements in a vector via union or some other way should be avoided at all costs, instead should be replaced by a _mm_shuffle_pd (i’m working on doubles only)…

I don’t seem to figure out how I can store the content of a __m128d vector as doubles without accessing it as a union. Also, does an operation like this give any performance gain when compared to scalar code?

union {
  __m128d v;
  double d[2];
} vec;
union {
  __m128d v;
double d[2];
} vec2;

vec.v = index1;
vec2.v = index2;
temp1 = _mm_mul_pd(temp1, _mm_set_pd(bvec[vec.d[1]], bvec[vec2[1]]));

also, the two unions look ridiculously ugly, but when using

union dvec {
  __m128d v;
  double d[2];
} vec;

Trying to declare the indexX as dvec, the compiler complained dvec is undeclared.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T13:58:38+00:00

Unfortunately if you look at MSDN it says the following:

You should not access the __m128d fields directly. You can, however, see these types in the debugger. A variable of type __m128 maps to the XMM[0-7] registers.

I’m no expert in SIMD, however this tells me that what you’re doing won’t work as it’s just not designed to.

EDIT:

I’ve just found this, and it says:

Use __m128, __m128d, and __m128i only on the left-hand side of an assignment, as a return value, or as a parameter. Do not use it in other arithmetic expressions such as “+” and “>>”.

It also says:

Use __m128, __m128d, and __m128i objects in aggregates, such as unions (for example, to access the float elements) and structures.

So maybe you can use them, but only in unions. Seems contradictory to what MSDN says, however.

EDIT2:

Here is another interesting resource that describes with examples on how to use these SIMD types

In the above link, you’ll find this line:

#include <math.h>
#include <emmintrin.h>
double in1_min(__m128d x)
{
    return x[0];
}

In the above we use a new extension in gcc 4.6 to access the high and low parts via indexing. Older versions of gcc require using a union and writing to an array of two doubles. This is cumbersome, and extra slow when optimization is turned off.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

The code i want to optimize is basically a simple but large arithmetic formula,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply