I have a ‘long-form’ data frame with columns id (the primary key) and featureCode (categorical variable). Each record has between 1 and 9 values of the categorical variable. For example:
id featureCode
5 PPLC
5 PCLI
6 PPLC
6 PCLI
7 PPL
7 PPLC
7 PCLI
8 PPLC
9 PPLC
10 PPLC
I’d like to calculate the number of times each feature code is used with the other feature codes (the “pairwise counts” of the title). At this stage, the order each feature code is used is not important. I envisage the result would be another data frame, where the rows and columns are feature codes, and the cells are counts. For example:
PPLC PCLI PPL
PPLC 0 3 1
PCLI 3 0 1
PPL 1 1 0
Unfortunately, I don’t know how to perform this calculation and I’ve drawn a blank when searching for advice (mostly, I suspect, because I don’t know the correct terminology).
Here is a
data.tableapproach similar to @mrdwabIt will work best if
featureCodeis acharacter