I compiled the code below with the VC++ 2010 compiler:
__declspec(dllexport)
unsigned int __cdecl __mm_getcsr(void) { return _mm_getcsr(); }
and the generated code was:
push ECX
stmxcsr [ESP]
mov EAX, [ESP]
pop ECX
retn
Why is there a push ECX/pop ECX instruction pair?
The compiler is making room on the stack to store the MXCSR. It could have equally well done this:
But “push ecx” is probably shorter or faster.