UPDATED AND SIMPLIFIED
I am having a really large table (~ 7 million records) which has the following structure.
temp <- read.table(header = TRUE, stringsAsFactors=FALSE,
text = "Website Datetime Rating
A 2007-12-06T14:53:07Z 1
A 2006-07-28T03:52:26Z 4
B 2006-11-02T11:06:25Z 2
C 2007-06-19T06:56:08Z 5
C 2009-11-28T22:27:58Z 2
C 2009-11-28T22:28:13Z 2")
What I want to retrieve is the unique websites with a max rating per website:
Website Rating
A 4
B 2
C 5
I tried using a for loop but it was too slow. Is there any other way I can achieve this.
I would probably explore the
data.tablepackage, though without more details, the following example solution is most likely not going to be what you need. I mention this because, in particular, there might be more than one “Rating” record per group which matchesmax; how would you like to deal with those cases?I would recommend that to get better answers, you might want to include information like how your datetime variable might factor into your aggregation, or whether it is possible for there to be more than one “max” value per group.
If you want all the rows that match the max, the fix is easy:
If you do just want the
Ratingcolumn, there are many ways to go about this. Following the same steps as above to convert to adata.table, try:Or, keeping the original “temp”
data.frame, tryaggregate():Yet another approach, using
ave: