I have the data frame containing longitudinal measurements of variables x and y, at various time points time, in several subjects id. However x and y have some missing values.
What I want is to aggregate the data frame so that for each id i get the first in time defined x and y value. x and y would be then at different time points but it does not matter.
testdf<-data.frame(id=c(rep("A",4),rep("B",4),rep("C",4) ), x=c(NA, NA, 1,2, 3, NA, NA, 1, 2, NA,NA, 5), y=rev(c(NA, NA, 1,2, 3, NA, NA, 1, 2, NA,NA, 5)), time=c(1,2,3,4,0.1,0.5,10,20,3,2,1,0.5))
So that testdf would reduce to
id x y
1 A 1 5
2 B 3 1
3 C 5 1
UPDATE: Would it be possible for a solution that allows the data frame to have a large number of variables (a solution or a function where you don’t have to explicitly defining thex and y variables in case the data frame has a large number of variables?
Is this what you want?
UPDATED
Here is the implicit version.