Currently I have this function to swap the bytes of a data in order to change endianness.
template<typename Type, unsigned int Half = sizeof(Type)/2, unsigned int End = sizeof(Type)-1>
inline void swapBytes(Type& x)
{
char* c = reinterpret_cast<char*>(&x);
char tmp;
for (unsigned int i = 0; i < Half; ++i) {
tmp = c[i];
c[i] = c[End-i];
c[End-i] = tmp;
}
}
This function will be called by some algorithms of mine several million times. Consequently, every single instruction that can be avoided would be a good thing.
My question is : how can this function be optimized ?
First of all you need to check if your hardware platform have byte swap instructions or not. Some platforms have these instructions, some of them not. After that you need to look for library function that uses them. Check the docs or stop in the debugger and look at the disassembly. It is a good chance that you will find one. It is unlikely that anything else will work better than this.
Ultimately write your own function in assembler that uses these instructions.
For a 2-byte type a straight table conversion will work. This is 128 kb that is not that much for our days computers. For 32 bit types this is close to overkill but in some (rare) cases may still work on a big 64-bit box.
You can also use combination of asm instructions, table conversion and optimized loop.