I’m trying to implement some inline assembler (in C/C++ code) to take advantage of SSE. I’d like to copy and duplicate values (from an XMM register, or from memory) to another XMM register. For example, suppose I have some values {1, 2, 3, 4} in memory. I’d like to copy these values such that xmm1 is populated with {1, 1, 1, 1}, xmm2 with {2, 2, 2, 2}, and so on and so forth.
Looking through the Intel reference manuals, I couldn’t find an instruction to do this. Do I just need to use a combination of repeated MOVSS and rotates (via PSHUFD?)?
There are two ways:
Use
shufpsexclusively:Let the compiler choose the best way using
_mm_set1_psand_mm_cvtss_f32:Note that the 2nd method will produce horrible code on MSVC, as discussed here, and will only produce ‘xxxx’ as result, unlike the first option.
This is highly unportable. Use intrinsics.