I have some performance-heavy code that performs bit manipulations. It can be reduced to the following well-defined problem:
Given a 13-bit bitmap, construct a 26-bit bitmap that contains the original bits spaced at even positions.
To illustrate:
0000000000000000000abcdefghijklm (input, 32 bits)
0000000a0b0c0d0e0f0g0h0i0j0k0l0m (output, 32 bits)
I currently have it implemented in the following way in C:
if (input & (1 << 12))
output |= 1 << 24;
if (input & (1 << 11))
output |= 1 << 22;
if (input & (1 << 10))
output |= 1 << 20;
...
My compiler (MS Visual Studio) turned this into the following:
test eax,1000h
jne 0064F5EC
or edx,1000000h
... (repeated 13 times with minor differences in constants)
I wonder whether I can make it any faster. I would like to have my code written in C, but switching to assembly language is possible.
- Can I use some MMX/SSE instructions to process all bits at once?
- Maybe I can use multiplication? (multiply by 0x11111111 or some other magical constant)
- Would it be better to use condition-set instruction (SETcc) instead of conditional-jump instruction? If yes, how can I make the compiler produce such code for me?
- Any other idea how to make it faster?
- Any idea how to do the inverse bitmap transformation (I have to implement it too, bit it’s less critical)?
Do it with a lookup table. 2^13 sound like a lot of entries but they will easily fit into the CPU cache.
Oh, and if there’s garbage in the other 19 bits, you need to mask them out first.