I have an array of uint64 and for all unset bits (0s), I do some evaluations.
The evaluations are not terribly expensive, but very few bits are unset. Profiling says that I spend a lot of time in the finding-the-next-unset-bit logic.
Is there a faster way (on a Core2duo)?
My current code can skip lots of high 1s:
for(int y=0; y<height; y++) {
uint64_t xbits = ~board[y];
int x = 0;
while(xbits) {
if(xbits & 1) {
... with x and y
}
x++;
xbits >>= 1;
}
}
(And any discussion about how/if to SIMD/CUDA-ise this would be an intriguing tangent!)
Here’s a quick micro-benchmark; please run it if you can to get stats for your system, and please add your own algorithms!
The commandline:
And the code: