I’ve got a data frame like this:
id date amt
1 2012-05-03 10:33 32
2 2012-06-01 12:49 242
2 2012-06-05 00:09 43
3 2012-06-03 05:19 323
3 2012-06-08 08:45 12
4 2012-06-09 12:38 32
5 2012-06-09 10:31 53
Now I want to remove the duplicate id‘s so that the one with the earliest date is selected. The number of duplicate entries varies. I care only about the first occurrence of each particular id and the corresponding amt, all other entries should be removed.
I understand how I can do this with a loop but I feel there can be a short and elegant solution in R.
Try something like
newdata <- data[!duplicated(data$id), ].EDIT: As @Aaron and others have noted below, this assumes your data is sorted: