I would like to copy a relatively short sequence of memory (less than 1 KB, typically 2-200 bytes) in a time critical function. The best code for this on CPU side seems to be rep movsd. However I somehow cannot make my compiler to generate this code. I hoped (and I vaguely remember seeing so) using memcpy would do this using compiler built-in intrinsics, but based on disassembly and debugging it seems compiler is using call to memcpy/memmove library implementation instead. I also hoped the compiler might be smart enough to recognize following loop and use rep movsd on its own, but it seems it does not.
char *dst;
const char *src;
// ...
for (int r=size; --r>=0; ) *dst++ = *src++;
Is there some way to make the Visual Studio compiler to generate rep movsd sequence other than using inline assembly?
Using memcpy with a constant size
What I have found meanwhile:
Compiler will use intrinsic when the copied block size is compile time known. When it is not, is calls the library implementation. When the size is known, the code generated is very nice, selected based on the size. It may be a single mov, or movsd, or movsd followed by movsb, as needed.
It seems that if I really want to use movsb or movsd always, even with a “dynamic” size I will have to use inline assembly or special intrinsic (see below). I know the size is “quite short”, but the compiler does not know it and I cannot communicate this to it – I have even tried to use __assume(size<16), but it is not enough.
Demo code, compile with “-Ob1 (expansion for inline only):
Specialized intrinsics
I have found recently there exists very simple way how to make Visual Studio compiler copy characters using movsd – very natural and simple: using intrinsics. Following intrinsics may come handy: