I am in the process of optimizing my code for matrix multiplication.
for (int i = 0; i < SIZE; i++) {
for (int j = 0; j < SIZE; j++) {
float tmp = 0;
for (int k = 0; k < SIZE; k+=4) {
v1 = _mm_load_ps(&m1[i][k]);
v2 = _mm_load_ps(&m2[j][k]);
vMul = _mm_mul_ps(v1, v2);
vRes = _mm_add_ps(vRes, vMul);
}
vRes = _mm_hadd_ps(vRes, vRes);
vRes = _mm_hadd_ps(vRes, vRes);
_mm_store_ss(&result[i][j], vRes);
}
}
But g++ complains that “*’_mm_hadd_ps’ was not declared in this scope*”. Why is that, I am able to use other SSE functions like _mm_add_ps …
Use
#include <x86intrin.h>, it will include all intrinsics supported by the target processor. Includingpmmintrin.hand alike is deprecated and not recommended in recent versions of GCC. Also make sure you target the SSE3 instruction set in your compilation, either by adding-msse3option, or (better) by using-march=option.