I have a dataset with about 3 million rows and the following structure: PatientID|

Question

0

Asked: May 26, 20262026-05-26T20:20:31+00:00 2026-05-26T20:20:31+00:00

I have a dataset with about 3 million rows and the following structure: PatientID|

0

I have a dataset with about 3 million rows and the following structure:

PatientID| Year | PrimaryConditionGroup
---------------------------------------
1        | Y1   | TRAUMA
1        | Y1   | PREGNANCY
2        | Y2   | SEIZURE
3        | Y1   | TRAUMA

Being fairly new to R, I have some trouble finding the right way to reshape the data into the structure outlined below:

PatientID| Year | TRAUMA | PREGNANCY | SEIZURE
----------------------------------------------
1        | Y1   | 1      | 1         | 0
2        | Y2   | 0      | 0         | 1
3        | Y1   | 1      | 0         | 1

My question is: What is the fastest/most elegant way to create a data.frame, where the values of PrimaryConditionGroup become columns, grouped by PatientID and Year (counting the number of occurences)?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T20:20:31+00:00

There are probably more succinct ways of doing this, but for sheer speed, it’s hard to beat a data.table-based solution:

df <- read.table(text="PatientID Year  PrimaryConditionGroup
1         Y1    TRAUMA
1         Y1    PREGNANCY
2         Y2    SEIZURE
3         Y1    TRAUMA", header=T)

library(data.table)
dt <- data.table(df, key=c("PatientID", "Year"))

dt[ , list(TRAUMA =    sum(PrimaryConditionGroup=="TRAUMA"),
           PREGNANCY = sum(PrimaryConditionGroup=="PREGNANCY"),
           SEIZURE =   sum(PrimaryConditionGroup=="SEIZURE")),
   by = list(PatientID, Year)]

#      PatientID Year TRAUMA PREGNANCY SEIZURE
# [1,]         1   Y1      1         1       0
# [2,]         2   Y2      0         0       1
# [3,]         3   Y1      1         0       0

EDIT: aggregate() provides a ‘base R’ solution that might or might not be more idiomatic. (The sole complication is that aggregate returns a matrix, rather than a data.frame; the second line below fixes that up.)

out <- aggregate(PrimaryConditionGroup ~ PatientID + Year, data=df, FUN=table)
out <- cbind(out[1:2], data.frame(out[3][[1]]))

2nd EDIT Finally, a succinct solution using the reshape package gets you to the same place.

library(reshape)
mdf <- melt(df, id=c("PatientID", "Year"))
cast(PatientID + Year ~ value, data=j, fun.aggregate=length)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a dataset with about 3 million rows and the following structure: PatientID|

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply