Must the data be 16-byte aligned so that it can be processed by the SSE instruction without segmentation fault? The compiler I tried is gcc with option -msse2. I want use _mm_cmpgt_epi32 to compare a large int array. I found that it can not be executed at any location of the array except the position with subscript of the multiples of 4.
Must the data be 16-byte aligned so that it can be processed by the
Share
Yes, when you load and store data to/from SSE registers it needs to be 16 byte aligned, unless you use the misaligned versions of the load/store instructions, e.g.
_mm_loadu_si128/_mm_storeu_si128. There is typically a performance penalty for using these misaligned load/store instructions however, so one would normally try to ensure correct data alignment at all times and only use misaligned loads/stores as a last resort.