There is a nice explanation here describing how to eliminate duplicates in a data

Question

0

Asked: May 26, 20262026-05-26T11:43:21+00:00 2026-05-26T11:43:21+00:00

There is a nice explanation here describing how to eliminate duplicates in a data

0

There is a nice explanation here describing how to eliminate duplicates in a data frame by picking the maximum variable.

I can also see how this can be applied to pick the duplicate with the minimum variable.

my question now is how do I display the mean of all duplicates?

for example:

z <- data.frame(id=c(1,1,2,2,3,4),var=c(2,4,1,3,5,2))
# id var
#  1   2
#  1   4
#  2   1
#  2   3
#  3   5
#  4   2

I would like the output:

# id var
#  1   3     mean(2,4)
#  2   2     mean(1,3)
#  3   5
#  4   2

My current code is:

averages<-do.call(rbind,lapply(split(z,z$id),function(chunk) mean(chunk$var)))
z<-z[order(z$id),]
z<-z[!duplicated(z$id),]
z$var<-averages

My code runs very slowly and is takes about 10 times longer than the method for picking the maximum. How do I optimize this code?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T11:43:22+00:00

I would use a combination of ave and unique:

z <- data.frame(id=rep(c(1,1,2,2,3,4),1e5),var=rnorm(6e5))
z$var <- ave(z$var, z$id, FUN=mean)
z <- unique(z)

UPDATE: after actually timing the solution, here’s something that’s a little faster.

z <- data.frame(id=rep(c(1,1,2,2,3,4),1e5),var=rnorm(6e5))
system.time({
  averages <- t(sapply(split(z,z$id), function(x) sapply(x,mean)))
})
#    user  system elapsed 
#    1.32    0.00    1.33 
system.time({
  z$var <- ave(z$var, z$id, FUN=mean)
  z <- unique(z)
})
#    user  system elapsed 
#    4.33    0.02    4.37

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

There is a nice explanation here describing how to eliminate duplicates in a data

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply