I have this problem in calculating Jaccard Similarity for Sets (Bit-Vectors):
v1 = 10111
v2 = 10011
Size of intersection = 3; (How could we find it out?)
Size of union = 4, (How could we find it out?)
Jaccard similarity = (intersection/union) = 3/4
But I don’t understand how could we find out the “intersection” and “union” of the two vectors.
Please help me.
Presumably your definitions of “intersection” and “union” are “number of bits set in both values” and “number of bits set in either value”…. which is (assuming you’re using something like
intorlongfor the vectors):Next you just need to implement
CountBits. This Stack Overflow question can help you there.Instead of using
intorlong, you may want to useBitArray. That hasAndandOrmethods, which look like they don’t mutate the original values, but it’s not entirely clear. You’d need to work out the best way of counting the bits set in aBitArrayof course – justarray.Cast<bool>().Count(bit => bit)may well work.