I’ve been looking into faceted search with Lucene.NET, I’ve found a brilliant example here which explains a fair amount, apart from the fact that it completely overlooks the function which checks the cardinality of items in a bit array.
Can anyone give me a run down of what it is doing? The main things I don’t understand is why the bitsSetArray is created as it is, what it is used for and how all the if statements work in the for loop.
This may be a big ask but I have to understand how this works before I can even think of using it in my own code.
Thanks
public static int GetCardinality(BitArray bitArray)
{
var _bitsSetArray256 = new byte[] {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8};
var array = (uint[])bitArray.GetType().GetField("m_array", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(bitArray);
int count = 0;
for (int index = 0; index < array.Length; index ++)
count += _bitsSetArray256[array[index] & 0xFF] + _bitsSetArray256[(array[index] >> 8) & 0xFF] + _bitsSetArray256[(array[index] >> 16) & 0xFF] + _bitsSetArray256[(array[index] >> 24) & 0xFF];
return count;
}
The
_bitsSetArray256array is initialised with values such that_bitsSetArray256[n]contains the number of bits set in the binary representation ofn, fornin0..255.For example,
_bitsSetArray256[13]equals 3, because 13 in binary is1101which contains 31s.The reason for doing this is that it’s far faster to pre-compute these values and store them, rather than having to work them out each time (or on-demand). It’s not like the number of
1s in the binary representation of 13 is ever going to change, after all 🙂Within the
forloop, we are looping through an array ofuints. A C#uintis a 32-bit quantity, ie made up for 4 bytes. Our lookup table tells us how many bits are set in a byte, so we must process each of the four bytes. The bit manipulation in thecount +=line extracts each of the four bytes, then gets its bit count from the lookup array. Adding together the bit counts for all four bytes gives the bit count for theuintas a whole.So given a
BitArray, this function digs into theuint[] m_arraymember, then returns the total number of bits set in the binary representation of theuints therein.