I have a dataset with 10 columns. The first column is an unique identifier. The 9 other columns are related attributes. For now, let’s just say they are integers. If needed, the data could easily be pivoted to a key-value.
Ex:
id|attr1|attr2|attr3|...
a | 2 | 5 | 7 |...
b | 3 | 1 |null |...
c | 2 |null |null |...
d | 1 | 2 | 5 |...
e | 2 | 1 | 3 |...
I’m essentially looking for the most frequent combinations of any length with at least a pair. So my output for this would be:
unq | frequency
1,2 | 2
1,3 | 2
1,5 | 1
2,3 | 1
2,5 | 2
2,7 | 1
1,2,3 | 1
1,2,5 | 1
2,5,7 | 1
(did this manually – so hopefully there are no errors) – the order of the paring doesn’t matter. 2,5,7 = 5,2,7 = 7,5,2 etc.
Any thoughts? I am open to different tools. I have access to R, excel, sql server, mysql, etc.
Excel is preferred but not required!
Here is a solution in R:
Recreate the data
Create a function to list all the combinations
Create a second function to count the combinations
The results: