I can achieve this task, but I feel like there must be a best

Question

0

Asked: May 25, 20262026-05-25T16:41:57+00:00 2026-05-25T16:41:57+00:00

I can achieve this task, but I feel like there must be a best

0

I can achieve this task, but I feel like there must be a “best” (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far …

For a specified set of categorical factors I want to construct a table of means and variances by group.

generate data:

set.seed(1001)
d <- expand.grid(f1=LETTERS[1:3],f2=letters[1:3],
                 f3=factor(as.character(as.roman(1:3))),rep=1:4)
d$y <- runif(nrow(d))
d$z <- rnorm(nrow(d))

desired output:

  f1 f2  f3    y.mean      y.var
1  A  a   I 0.6502307 0.09537958
2  A  a  II 0.4876630 0.11079670
3  A  a III 0.3102926 0.20280568
4  A  b   I 0.3914084 0.05869310
5  A  b  II 0.5257355 0.21863126
6  A  b III 0.3356860 0.07943314
... etc. ...

using aggregate/merge:

library(reshape)
m1 <- aggregate(y~f1*f2*f3,data=d,FUN=mean)
m2 <- aggregate(y~f1*f2*f3,data=d,FUN=var)
mvtab <- merge(rename(m1,c(y="y.mean")),
      rename(m2,c(y="y.var")))

using ddply/summarise (possibly best but haven’t been able to make it work):

mvtab2 <- ddply(subset(d,select=-c(z,rep)),
                .(f1,f2,f3),
                summarise,numcolwise(mean),numcolwise(var))

results in

Error in output[[var]][rng] <- df[[var]] : 
  incompatible types (from closure to logical) in subassignment type fix

using melt/cast (maybe best?)

mvtab3 <- cast(melt(subset(d,select=-c(z,rep)),
          id.vars=1:3),
     ...~.,fun.aggregate=c(mean,var))
## now have to drop "variable"
mvtab3 <- subset(mvtab3,select=-variable)
## also should rename response variables

Won’t (?) work in reshape2. Explaining ...~. to someone could be tricky!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T16:41:58+00:00

I’m a bit puzzled. Does this not work:

mvtab2 <- ddply(d,.(f1,f2,f3),
            summarise,y.mean = mean(y),y.var = var(y))

This give me something like this:

   f1 f2  f3    y.mean       y.var
1   A  a   I 0.6502307 0.095379578
2   A  a  II 0.4876630 0.110796695
3   A  a III 0.3102926 0.202805677
4   A  b   I 0.3914084 0.058693103
5   A  b  II 0.5257355 0.218631264

Which is in the right form, but it looks like the values are different that what you specified.

Edit

Here’s how to make your version with numcolwise work:

mvtab2 <- ddply(subset(d,select=-c(z,rep)),.(f1,f2,f3),summarise,
                y.mean = numcolwise(mean)(piece),
                y.var = numcolwise(var)(piece))

You forgot to pass the actual data to numcolwise. And then there’s the little ddply trick that each piece is called piece internally. (Which Hadley points out in the comments shouldn’t be relied upon as it may change in future versions of plyr.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I can achieve this task, but I feel like there must be a best

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply