UPDATED AND SIMPLIFIED I am having a really large table (~ 7 million records)

Question

0

Editorial Team

Asked: June 16, 20262026-06-16T04:22:22+00:00 2026-06-16T04:22:22+00:00

UPDATED AND SIMPLIFIED I am having a really large table (~ 7 million records)

0

UPDATED AND SIMPLIFIED

I am having a really large table (~ 7 million records) which has the following structure.

temp <- read.table(header = TRUE, stringsAsFactors=FALSE,
                   text = "Website Datetime    Rating
A 2007-12-06T14:53:07Z        1
A 2006-07-28T03:52:26Z        4
B 2006-11-02T11:06:25Z        2
C 2007-06-19T06:56:08Z        5
C 2009-11-28T22:27:58Z        2
C 2009-11-28T22:28:13Z        2")

What I want to retrieve is the unique websites with a max rating per website:

Website    Rating
A    4
B    2
C    5

I tried using a for loop but it was too slow. Is there any other way I can achieve this.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T04:22:23+00:00

I would probably explore the data.table package, though without more details, the following example solution is most likely not going to be what you need. I mention this because, in particular, there might be more than one “Rating” record per group which matches max; how would you like to deal with those cases?

library(data.table)
temp <- read.table(header = TRUE, stringsAsFactors=FALSE,
                text = "Website Datetime    Rating
                        A       2012-10-9   10
                        A       2012-11-10  12
                        B       2011-10-9   5")
DT <- data.table(temp, key="Website")
DT
#    Website   Datetime Rating
# 1:       A  2012-10-9     10
# 2:       A 2012-11-10     12
# 3:       B  2011-10-9      5
DT[, list(Datetime = Datetime[which.max(Rating)], 
          Rating = max(Rating)), by = key(DT)]
#    Website   Datetime Rating
# 1:       A 2012-11-10     12
# 2:       B  2011-10-9      5

I would recommend that to get better answers, you might want to include information like how your datetime variable might factor into your aggregation, or whether it is possible for there to be more than one “max” value per group.

If you want all the rows that match the max, the fix is easy:

DT[, list(Time = Times[Rating == max(Rating)], 
          Rating = max(Rating)), by = key(DT)]

If you do just want the Rating column, there are many ways to go about this. Following the same steps as above to convert to a data.table, try:

DT[, list(Datetime = max(Rating)), by = key(DT)]
     Website Datetime
# 1:       A        4
# 2:       B        2
# 3:       C        5

Or, keeping the original “temp” data.frame, try aggregate():

aggregate(Rating ~ Website, temp, max)
    Website Rating
# 1       A      4
# 2       B      2
# 3       C      5

Yet another approach, using ave:

temp[with(temp, Rating == ave(Rating, Website, FUN=max)), ]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

UPDATED AND SIMPLIFIED I am having a really large table (~ 7 million records)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply