You'll need to use the 'Row style output' template (e.g.…

Question

0

Asked: May 12, 20262026-05-12T06:52:20+00:00 2026-05-12T06:52:20+00:00

I currently have the following code: float a[4] = { 10, 20, 30, 40

0

I currently have the following code:

float a[4] = { 10, 20, 30, 40 };
float b[4] = { 0.1, 0.1, 0.1, 0.1 };
asm volatile("movups (%0), %%xmm0\n\t"
             "mulps (%1), %%xmm0\n\t"             
             "movups %%xmm0, (%1)"             
             :: "r" (a), "r" (b));

I have first of all a few questions:

(1) if i WERE to align the arrays on 16 byte boundaries, would it even work? Since the arrays are allocated on the stack is it true that aligning them is near impossible?

see the selected answer for this post: Are stack variables aligned by the GCC __attribute__((aligned(x)))?

(2) Could the code be refactored at all to make it more efficient? What if I put both float arrays in registers rather than just one?

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T06:52:20+00:00

if i WAS to align the arrays on 16 byte boundaries, would it even work? Since the arrays are allocated on the stack is it true that aligning them is near impossible?

It is required that alignment on the stack works. Otherwise intrinsics would not work. I would guess the post you quoted had to do with the exorbitant value he selected for the alignment value.

to 2:

No, there shouldn’t be a difference in performance. See this site for the instruction timings of several processors.

How alignment of stack variables works :

push    ebp
mov ebp, esp
and esp, -16                ; fffffff0H
sub esp, 200                ; 000000c8H

The and aligns the begin of the stack to 16 byte.

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions