I’ve been scratching my head over this one for some time. I’m using GCC 4.4.4 (I have checked GCCs 3.4.6, 4.4.6, and 4.6.3.) and ran into an issue in some math I was doing. I boiled the example into the following self-contained program:
#include <stdio.h>
int main()
{
float something[4] = { 1.0f, 2.0f, 3.0f, 4.0f };
asm volatile
(
"movups %0, %%xmm0 \n\t"
"movups %%xmm0, %0 \n\t"
: "=m" (*something)
:
: "memory", "xmm0"
);
printf("%.0f %.0f %.0f %.0f\n",
something[0], something[1], something[2], something[3]);
return 0;
}
Compiled simply with
gcc -msse -O -o something something.c
it fails by somehow corrupting the first array element (except on the GCC 3.4.6 I tried … there, it works fine). I can’t, for the life of me, see anything fundamentally wrong here.
If I, instead, change the ASM block in question to
_mm_storeu_ps(something, _mm_loadu_ps(something));
it works fine. I checked the generated assembly code and found that the version with the ASM block contained one less store operation leading up to the SSE part:
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $64, %esp
movl $0x40000000, 52(%esp)
movl $0x40400000, 56(%esp)
movl $0x40800000, 60(%esp)
versus the more correct (code using intrinsics):
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $64, %esp
movl $0x3f800000, 48(%esp)
movl $0x40000000, 52(%esp)
movl $0x40400000, 56(%esp)
movl $0x40800000, 60(%esp)
WTF is wrong with either me or GCC?
(Note, this is a boiled-down, concise example showing the root problem I’ve tracked down. There are reasons for the ASM block and the volatile keyword all of which don’t really seem to address the main concern I’m putting forward here.)
You used wrong constraint (incidentally this is the second such problem asked here today).
The
=means output, so gcc thought you were going to assign*somethingwhich is the first array element. So it figured it can omit the initialization since you will overwrite it anyway. You should use+sign to mark an operand as input-output, like so:"+m" (*something)."=m" (something)in general means you will assign to the pointer, as such gcc could decide to omit all the initialization. Note that for arrays this shouldn’t even compile, just like the equivalent C code doesn’t. It’s just a lucky accident (aka. compiler bug) that it compiles and even works.