What is the best algorithm to find the sets in a finite collection of sets that are a subset of a specific set?
For example, if
A = {1, 2}
B = {2, 3, 4}
C = {3, 5}
D = {6}
and X = {1, 2, 3, 5}
Then, A and C are subsets of X.
Is there an algorithm that I could do this in linear time complexity?
Implementation Note: The members of the sets are generally from a very limited range, therefore, it could be a good idea to use C++ bitset to implement the algorithm. Couldn’t it?
Edit: The number of sets in the collection is generally very greater than The number of elements in X (in the example). Is there a way to do this linear in terms of the number of elements in X? Probably using hash or something?
Let’s assume for a moment 64 possible elements.
Then, if you represent each element as a bit, you can use a 64 bits long integer to represent each set, and then:
a & bis the set intersection ofaandb.If (and only if)
ais a subset ofbthena & b == a.Of course you can use a bitset if you need more then 64 bits.
For large range of elements, using a hash table to store (once) the superset, and then iterating the potential subsets to check if all elements are in it can be done.
It is linear in the input size (average case).
EDIT: (response to editted question)
Unless you pre-stored some information on the data – it cannot be done betetr then
O(|X| + n*min{m,|X|})Where |X| is the size of the set X,nis the number of sets, andmis the average size of the sets.The reason for this is becasue at worst case, you need to read all elements in all set (because the last element you read for each set decides if it is a subset or not), and thus we cannot achieve better without previous knowledge on the sets.
The suggested solutions are:
Bitset:
O(|X|*n)Hash solution:
O(|X| + min{m,|X|}*n)(average case)Although the hash solution provides better asymptotic complexity, the constants are much better for a bitset- and thus the bitset solution will probably be faster for small
|X|