I currently have the following code:
float a[4] = { 10, 20, 30, 40 };
float b[4] = { 0.1, 0.1, 0.1, 0.1 };
asm volatile("movups (%0), %%xmm0\n\t"
"mulps (%1), %%xmm0\n\t"
"movups %%xmm0, (%1)"
:: "r" (a), "r" (b));
I have first of all a few questions:
(1) if i WERE to align the arrays on 16 byte boundaries, would it even work? Since the arrays are allocated on the stack is it true that aligning them is near impossible?
see the selected answer for this post: Are stack variables aligned by the GCC __attribute__((aligned(x)))?
(2) Could the code be refactored at all to make it more efficient? What if I put both float arrays in registers rather than just one?
Thanks
It is required that alignment on the stack works. Otherwise intrinsics would not work. I would guess the post you quoted had to do with the exorbitant value he selected for the alignment value.
to 2:
No, there shouldn’t be a difference in performance. See this site for the instruction timings of several processors.
How alignment of stack variables works :
The and aligns the begin of the stack to 16 byte.