Here is an example dataframe: set.seed(0) x1 <- c(1, 1, 1, 1, 1, 2,

Question

0

Asked: May 25, 20262026-05-25T02:55:33+00:00 2026-05-25T02:55:33+00:00

Here is an example dataframe: set.seed(0) x1 <- c(1, 1, 1, 1, 1, 2,

0

Here is an example dataframe:

set.seed(0)
x1 <- c(1, 1, 1, 1, 1, 2, 2, 2, 2)
x2 <- c(1, 1, 0, 0, 0, 1, 1, 1, 1)
x3 <- c(1, 1, 2, 2, 4, 1, 1, 2, 1)
n  <- c(1, 1, 1, 5, 5, 1, 1, 1, 1)
y <- rnorm(9)

mydf <- data.frame(x1, x2, x3, n, y)

What I would like to do is

identify rows with n=1 and which share identical values of (x1, x2, x3)
return a single row for each subset with y = mean(y) and n = length(y)
keep other rows the same.

for example, the new dataframe would be

x1 <- c(1,            1,    1,    1,    2,                 2)
x2 <- c(1,            0,    0,    0,    1,                 1)
x3 <- c(1,            2,    2,    4,    1,                 2)
n  <- c(2,            1,    5,    5,    3,                 1)
y  <- c(mean(y[1:2]), y[3], y[4], y[5], mean(y[c(6:7,9)]), y[8])

newdf <- data.frame(x1, x2, x3, n, y)

I can figure this out with conditionals and loops, but I would prefer to learn more elegant way to do this.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T02:55:34+00:00

By “identical values in other columns”, I take it you mean that each subset is defined by the same value of x1 in each of the rows of the subset, not that x1 is equal to x2. Thanks for the example to see what you meant.

library("plyr")

To get parts one and two

ddply(mydf[mydf$n==1,], .(x1, x2, x3), summarise, n = length(y), y = mean(y))

This can be rbind-ed with the part of mydf where n!=1 to get what you said

rbind(
  ddply(mydf[mydf$n==1,], .(x1, x2, x3), summarise, n = length(y), y = mean(y)),
  mydf[mydf$n!=1,]
)

This doesn’t have the same order as you listed. If that is really important, you can add some auxiliary sorting variables.

mydf$order = seq(length=nrow(mydf))
newdf <- rbind(
  ddply(mydf[mydf$n==1,], .(x1, x2, x3), summarise, 
    n = length(y), y = mean(y), order=min(order)),
  mydf[mydf$n!=1,]
)
newdf <- newdf[order(newdf$order),]
newdf$order <- NULL

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Here is an example dataframe: set.seed(0) x1 <- c(1, 1, 1, 1, 1, 2,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply