I am looking for an efficient way to determine if a set is a subset of another set in Matlab or Mathematica.
Example:
Set A = [1 2 3 4]
Set B = [4 3]
Set C = [3 4 1]
Set D = [4 3 2 1]
The output should be: Set A
Sets B and C belong to set A because A contains all of their elements, therefore, they can be deleted (the order of elements in a set doesn’t matter). Set D has the same elements as set A and since set A precedes set D, I would like to simply keep set A and delete set D.
So there are two essential rules:
1. Delete a set if it is a subset of another set
2. Delete a set if its elements are the same as those of a preceding set
My Matlab code is not very efficient at doing this – it mostly consists of nested loops.
Suggestions are very welcome!
Additional explanation: the issue is that with a large number of sets there will be a very large number of pairwise comparisons.
You will likely want to take a look at the built-in set operation functions in MATLAB. Why reinvent the wheel if you don’t have to? 😉
HINT: The ISMEMBER function may be of particular interest to you.
EDIT:
Here’s one way you can approach this problem using nested loops, but setting them up to try and reduce the number of potential iterations. First, we can use the suggestion in Marc‘s comment to sort the list of sets by their number of elements so that they are arranged largest to smallest:
Now we can set up our loops to start with the smallest sets at the end of the list and compare them first to the largest sets at the start of the list to increase the odds we will find a superset quickly (i.e. we’re banking on larger sets being more likely to contain smaller sets). When a superset is found, we remove the subset from the list and break the inner loop:
After running the above code,
setListwill have all sets removed from it that are either subsets or duplicates of other sets preceding them in the list.In the best case scenario (e.g. the sample data in your question) the inner loop breaks after the first iteration every time, performing only
nSets-1set comparisons using ISMEMBER. In the worst case scenario the inner loop never breaks and it will perform(nSets-1)*nSets/2set comparisons.