I’m writing a program which will need to do a very large number of binary searches—at least 1015—in a tight loop. These together with a small number of bitwise operations will make up over 75% of the runtime of the program, so making them fast is important. (As implemented now it takes up over 95% of the time, but that’s using a very different implementation [not a search] which I am replacing.)
The array to be searched (of course, it need not be implemented as an array) is very small. In my current case it consists of 41 64-bit integers, though techniques for optimizing arrays of other sizes would be useful. (I’ve come across similar problems before.)
I can profile the data in advance to determine what ranges are most likely and how often there is a match. Collecting this information is not too easy but I should have it by the end of the day.
My code will be in C perhaps using inline assembly; it will be compiled with a recent version of gcc. Responses in any language are welcome; if you prefer (e.g.) FORTRAN I can translate.
So: How can I implement this search efficiently?
Clarification: I’m actually using the search to test membership, not to use the location in the array. A solution that discards that information is acceptable.
Final code:
long ispow3_tiny(ulong n)
{
static ulong pow3table[] = {
#ifdef LONG_IS_64BIT
12157665459056928801, 0, 4052555153018976267, 1350851717672992089, 0, 450283905890997363, 150094635296999121, 0, 50031545098999707, 0, 16677181699666569, 5559060566555523, 0, 1853020188851841, 617673396283947, 0, 205891132094649, 0, 68630377364883, 22876792454961, 0, 7625597484987, 2541865828329, 0, 847288609443, 282429536481, 0, 94143178827, 0, 31381059609, 10460353203, 0,
#endif
3486784401, 1162261467, 0, 387420489, 0, 129140163, 43046721, 0, 14348907, 4782969, 0, 1594323, 531441, 0, 177147, 0, 59049, 19683, 0, 6561, 2187, 0, 729, 0, 243, 81, 0, 27, 9, 0, 3, 1
};
return n == pow3table[__builtin_clzl(n)];
}
Since your values are powers of three, I think we can optimize greatly. Let’s look at the numbers in binary:
The observation is that all values have a unique MSB.
Using the x86 bit scanning instruction, we can quickly determine the MSB.
http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_6/CH06-4.html#HEADING4-67
Use the MSB as an index into a 64-entry table. Compare the value in the table with the value being checked for equality. If they are not equal, the test fails.
Edit: j_random_hacker pointed out that the lowest 8-bits are all unique as well. You might want to implement each version and see which is fastest.