Background
Now, I have a data frame shaped like:
example = structure(list(sid = c(39, 40, 41, 42, 42, 43, 43, 44, 45, 45,
46, 46, 47, 48, 49, 49, 50, 51, 52, 52, 53), monthday = c("42",
"44", "46", "410", "428", "423", "49", "411", "416", "430", "418",
"426", "419", "420", "420", "53", "421", "424", "425", "53",
"511")), .Names = c("sid", "monthday"), row.names = c(301L, 300L,
298L, 296L, 282L, 288L, 297L, 295L, 294L, 281L, 293L, 285L, 292L,
291L, 290L, 278L, 289L, 287L, 286L, 279L, 270L), class = "data.frame")
In other words, it is tall:
sid monthday
39 42
40 44
41 46
42 410
42 428
43 423
43 49
Ultimately, I would like to make it into a wide format:
sid monthday1 monthday2
39 42 NA
40 44 NA
41 46 NA
42 410 428
43 423 49
etc
I’ve been trying things with reshape and reshape2 packages and also with aggregate like:
library(reshape2)
temp = melt(example,id.vars=c("sid"))
data.wide <- dcast(temp, sid ~ variable, value.var="value")
But can’t wrapp my brain around it. It occurs to me that if I could identify the occurance of each sid, I could solve my problem.
Immediate Problem
So how can take the tall data sid column above I make a new variable that indicates the occruence of each sid:
sid occur
39 1
40 1
41 1
42 1
42 2
43 1
43 2
the occur variable is indicating that sid values 39, 40, and 41 only appear once while 42 and 43 have first and second instances. If I only ever had two instances, I could use duplicated() and convert that to numeric, but what is a solution that can generalize to an arbitrary number of instances?
You can use
aveto generate your “times”:Or, with
dcastfrom “reshape2” after adding your time variable: