Given a list of sets:
- S_1 : [ 1, 2, 3, 4 ]
- S_2 : [ 3, 4, 5, 6, 7 ]
- S_3 : [ 8, 9, 10, 11 ]
- S_4 : [ 1, 8, 12, 13 ]
- S_5 : [ 6, 7, 14, 15, 16, 17 ]
What the most efficient way to merge all sets that share at least 2 elements? I suppose this is similar to a connected components problem. So the result would be:
- [ 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17] (S_1 UNION S_2 UNION S_5)
- [ 8, 9, 10, 11 ]
- [ 1, 8, 12, 13 ] (S_4 shares 1 with S_1, and 8 with S_3, but not merged because they only share one element in each)
The naive implementation is O(N^2), where N is the number of sets, which is unworkable for us. This would need to be efficient for millions of sets.
My head is saying this is about Order (2N ln N). Take that with a grain of salt.