I am trying to calculate squared Euclidean distance between two 4d float vectors using SSE2. My os is Mac OS X 10.7 Lion.
When I use Apple LLVM compiler in XCode 4.5.2 everything is fine. But when I switch into GCC 4.2 in project’s settings I have error EXC_BAD_ACCESS at _mm_mul_ps operation.
When I compile code from command line (g++ main.cpp) without additional arguments I have “Segmentation fault”. But when I enable any optimization level (O1, O2, O3, Os) except O0 everything works.
I can not reproduce this issue on my Ubuntu 12.04 with GCC 4.6.3.
#include <stdio.h>
#include <emmintrin.h>
typedef float SPPixel[4];
float sp_squared_color_diff(const SPPixel px1, const SPPixel px2) {
SPPixel d;
__m128 sse_px1 = _mm_load_ps(px1);
__m128 sse_px2 = _mm_load_ps(px2);
sse_px1 = _mm_sub_ps(sse_px1, sse_px2);
sse_px2 = _mm_mul_ps(sse_px1, sse_px1); // EXC_BAD_ACCESS
_mm_store_ps(d, sse_px2);
return d[0] + d[1] + d[2] + d[3];
}
int main(int argc, const char * argv[]) {
SPPixel a __attribute__ ((aligned (16))) = {1, 2, 3, 4};
SPPixel b __attribute__ ((aligned (16))) = {2, 4, 6, 8};
float result = sp_squared_color_diff(a, b);
printf("result = %f\n", result);
return 0;
}
The local variable
dis misaligned. Fix the alignment in the typedef forSPPixelrather than having to remember it on every definition.Change:
to:
and then you can also remove the
__attribute__ ((aligned(16)))qualifiers in main.