When I was reading the nginx code, I have seen this function :
#define ngx_cpymem(dst, src, n) (((u_char *) memcpy(dst, src, n)) + (n))
static ngx_inline u_char *
ngx_copy(u_char *dst, u_char *src, size_t len)
{
if (len < 17) {
while (len) {
*dst++ = *src++;
len--;
}
return dst;
} else {
return ngx_cpymem(dst, src, len);
}
}
It’s a simple string copy function. But why it tests the length of string and switch to memcpy if the length is >= 17 ?
It is an optimization – for very small strings simple copy is faster than calling a system (libc) copy function.
Simple copy with
whileloop works rather fast for short strings, and system copy function have (usually) optimizations for long strings. But also system copy does a lot of checks and some setup.Actually, there is a comment by author just before this code: nginx, /src/core/ngx_string.h (search ngx_copy)
Also, a two line upper is
So, author did measurements and conclude that ICC optimized memcopy do a long CPU check to select a most optimized memcopy variant. He found that copying 16 bytes by hand is faster than fastest memcpy code from ICC.
For other compilers nginx does use
ngx_cpymem(memcpy) directlyAuthor did a study of different
memcpys for different sizes: