I can achieve this task, but I feel like there must be a “best” (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far …
For a specified set of categorical factors I want to construct a table of means and variances by group.
generate data:
set.seed(1001)
d <- expand.grid(f1=LETTERS[1:3],f2=letters[1:3],
f3=factor(as.character(as.roman(1:3))),rep=1:4)
d$y <- runif(nrow(d))
d$z <- rnorm(nrow(d))
desired output:
f1 f2 f3 y.mean y.var
1 A a I 0.6502307 0.09537958
2 A a II 0.4876630 0.11079670
3 A a III 0.3102926 0.20280568
4 A b I 0.3914084 0.05869310
5 A b II 0.5257355 0.21863126
6 A b III 0.3356860 0.07943314
... etc. ...
using aggregate/merge:
library(reshape)
m1 <- aggregate(y~f1*f2*f3,data=d,FUN=mean)
m2 <- aggregate(y~f1*f2*f3,data=d,FUN=var)
mvtab <- merge(rename(m1,c(y="y.mean")),
rename(m2,c(y="y.var")))
using ddply/summarise (possibly best but haven’t been able to make it work):
mvtab2 <- ddply(subset(d,select=-c(z,rep)),
.(f1,f2,f3),
summarise,numcolwise(mean),numcolwise(var))
results in
Error in output[[var]][rng] <- df[[var]] :
incompatible types (from closure to logical) in subassignment type fix
using melt/cast (maybe best?)
mvtab3 <- cast(melt(subset(d,select=-c(z,rep)),
id.vars=1:3),
...~.,fun.aggregate=c(mean,var))
## now have to drop "variable"
mvtab3 <- subset(mvtab3,select=-variable)
## also should rename response variables
Won’t (?) work in reshape2. Explaining ...~. to someone could be tricky!
I’m a bit puzzled. Does this not work:
This give me something like this:
Which is in the right form, but it looks like the values are different that what you specified.
Edit
Here’s how to make your version with
numcolwisework:You forgot to pass the actual data to
numcolwise. And then there’s the littleddplytrick that each piece is calledpieceinternally. (Which Hadley points out in the comments shouldn’t be relied upon as it may change in future versions ofplyr.)