I have a dataset with 10 columns. The first column is an unique identifier.

Question

0

Asked: May 23, 20262026-05-23T00:57:31+00:00 2026-05-23T00:57:31+00:00

I have a dataset with 10 columns. The first column is an unique identifier.

0

I have a dataset with 10 columns. The first column is an unique identifier. The 9 other columns are related attributes. For now, let’s just say they are integers. If needed, the data could easily be pivoted to a key-value.

Ex:

id|attr1|attr2|attr3|...
a |  2  |  5  |  7  |...
b |  3  |  1  |null |...
c |  2  |null |null |...
d |  1  |  2  |  5  |...
e |  2  |  1  |  3  |...

I’m essentially looking for the most frequent combinations of any length with at least a pair. So my output for this would be:

unq   | frequency
1,2   | 2
1,3   | 2
1,5   | 1
2,3   | 1
2,5   | 2
2,7   | 1
1,2,3 | 1
1,2,5 | 1
2,5,7 | 1

(did this manually – so hopefully there are no errors) – the order of the paring doesn’t matter. 2,5,7 = 5,2,7 = 7,5,2 etc.

Any thoughts? I am open to different tools. I have access to R, excel, sql server, mysql, etc.

Excel is preferred but not required!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T00:57:32+00:00

Here is a solution in R:

Recreate the data

x <- data.frame(
    id = letters[1:5],
    attr1 = c(2,3,2,1,2),
    attr2 = c(5,1,NA,2,1),
    attr3 = c(7,NA,NA,5,3))
x

  id attr1 attr2 attr3
1  a     2     5     7
2  b     3     1    NA
3  c     2    NA    NA
4  d     1     2     5
5  e     2     1     3

Create a function to list all the combinations

make_combinations <- function(data, size){
  t1 <- apply(data[, -1], 1, function(data)unname(sort(data)))
  t2 <- lapply(t1, function(xt){if(length(xt)>=size){combn(xt, size)}})
  t3 <- sapply(t2[!is.na(t2)], 
      function(chunk){if(!is.null(chunk))apply(chunk, 2, function(x)paste(x, collapse=","))})
  t4 <- unlist(t3)
  t4
}

Create a second function to count the combinations

count_combinations <- function(data, nn=2:3){
  tmp <- unlist(lapply(nn, function(n)make_combinations(data, n)))
  sort(table(tmp), decreasing=TRUE)
}

The results:

count_combinations(x, 2:3)


  1,2   1,3   2,5 1,2,3 1,2,5   1,5   2,3 2,5,7   2,7   5,7 
    2     2     2     1     1     1     1     1     1     1

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a dataset with 10 columns. The first column is an unique identifier.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply