My data frame looks like this:
> df
id u.1t u.2 v.1 v.2
1 A 1 NA 5 NA
2 A 2 NA 4 6
3 A 1 4 5 NA
4 B 10 13 40 NA
5 B 10 12 42 NA
6 B 10 NA 41 NA
and I would like to know the id-specific means of the u.* and the v.* columns respectively like this:
> mean
id u.mean v.mean
1 A 2 5
2 B 11 41
this is the data
df<-data.frame(id=c("A","A","A","B","B","B"),u.1t=c(1,2,1,10,10,10),u.2=c(NA,NA,4,13,12,NA),v.1=c(5,4,5,40,42,41),v.2=c(NA,6,NA,NA,NA,NA))
As is clear, by introducing NA’s, the overall mean is unequal to the mean of the row- or column-means, which is the problem here.
I thought this to be a job for by, but it turns out I can’t get by to do anything but columnwise operations?
Help is greatly appreciated–thanks
If you want to use
by, try something like this:The output is a bit ugly. While you can tidy it up, I would use the
plyrpackage:This doesn’t quite achieve what you are after, as it takes the average of each column: it doesn’t combine the
u.*andv.*columns. To do that, I wouldmeltthe data first and then useplyr:If you choose to, you can
castthis back: