I’ve done some inline ASM coding for SSE before and it was not too hard even for someone who doesn’t know ASM. But I note MS also provide intrinsics wrapping many such special instructions.
Is there a particular performance difference, or any other strong reason why one should be used above the other?
To repeat from the title, this is specifically covering intrinsics exposed by VC++ 2008 for unmanaged, native C++.
In general it’s better to use intrinsics – it’s more productive for the programmer and a good compiler (e.g. Intel ICC) will do a decent job of register allocation, instruction scheduling etc. The Microsoft compiler is not as good in this respect but it probably still does a reasonable job – you can always switch to ICC later if you need to get better performance.