I have a table that has about 10000 entries each entry has almost 100 boolean values. A user checkboxes a bunch of the booleans and hopes to get a result that matches their request. If that record doesn’t exist, I want to show them maybe 5 records that are close(have only 1 or two values different). Is there a good hash system or data structure that can help me find these results.
Share
Bitmap indices. Google for the paper if you want the complete background, it’s not easy but worth a read. Basically build bitmpas for your boolean values like this:
And then just XOR your filter through them, sort by number of matches, return. Since all operations are insanely fast (about one cycle per element, and the data structure uses (edit) 100 bits of memory per element), this will usually work even though it’s linear.
Addendum: How to XOR. (fixed a bug)
The two 1’s tell you that there are 2 errors and m-2 matches, where m is the number of bits.