Consider the simple code:
UINT64 result;
UINT32 high, low;
...
result = ((UINT64)high << 32) | (UINT64)low;
Do modern compilers turn that into a real barrel shift on high, or optimize it to a simple copy to the right location?
If not, then using a union would seem to be more efficient than the shift that most people appear to use. However, having the compiler optimize this is the ideal solution.
I’m wondering how I should advise people when they do require that extra little bit of performance.
I wrote the following (hopefully valid) test:
Running a diff of the unoptimized output of
gcc -s:I don’t know assembly, so it’s hard for me to analyze that. However, it looks like some shifting is taking place as expected on the non-union (top) version.
But with optimizations
-O2enabled, the output was identical. So the same code was generated and both ways will have the same performance.(gcc version 4.5.2 on Linux/AMD64)
Partial output of optimized
-O2code with or without union:The snippet begins immediately after the jump generated by the
ifline.