So i was trying to do an array operation that looked something like
for (int i=0;i++i<32)
{
output[offset+i] += input[i];
}
where output and input are float arrays (which are 16-byte aligned thanks to malloc). However, I can’t gurantee that offset%4=0. I was wondering how you could fix these alignment problems.
I though something like
while (offset+c %4 != 0)
{
c++;
output[offset+c] += input[c];
}
followed by an aligned loop – obviously this can’t work as we now need an unaligned access to input.
Is there a way to vectorize my original loop?
Moving comments to an answer:
There are SSE instructions for misaligned memory accesses. They are accessible via the following intrinsics:
_mm_loadu_ps()– documentation_mm_storeu_ps()– documentationand similarly for all the
doubleand integer types.So if you can’t guarantee alignment, then this is the easy way to go. If possible, the ideal solution is to align your arrays from the start so that you avoid this problem altogether.
There will still be a performance penalty for misaligned accesses, but they’re unavoidable unless you resort to extremely messy shift/shuffle hacks (such as
_mm_alignr_epi8()).The code using
_mm_loadu_psand_mm_storeu_ps– this is actually 50% slower than what gcc does by itself