I have a data.frame df in format “long”.
df <- data.frame(site = rep(c("A","B","C"), 1, 7),
time = c(11,11,11,22,22,22,33),
value = ceiling(rnorm(7)*10))
df <- df[order(df$site), ]
df
site time value
1 A 11 12
2 A 22 -24
3 A 33 -30
4 B 11 3
5 B 22 16
6 C 11 3
7 C 22 9
Question
How do I remove the rows where an unique element of df$time is not present for each of the levels of df$site ?
In this case I want to remove df[3,], because for df$time the timestamp 33 is only present for site A and not for site B and site C.
Desired output:
df.trimmed
site time value
1 A 11 12
2 A 22 -24
4 B 11 3
5 B 22 16
6 C 11 3
7 C 22 9
The data.frame has easily 800k rows and 200k unique timestamps. I don’t want to use loops but I don’t know how to use vectorized functions like apply() or lapply() for this case.
Here’s another possible solution using the
data.tablepackage:EDIT from Matthew
Nice. Or a slightly more direct way :
In case no time is present in all sites, when final result should be empty (as Ben pointed out in comments), the step marked
*above could be :