My data frame looks like this: > df id u.1t u.2 v.1 v.2 1

Question

0

Asked: June 11, 20262026-06-11T22:17:18+00:00 2026-06-11T22:17:18+00:00

My data frame looks like this: > df id u.1t u.2 v.1 v.2 1

0

My data frame looks like this:

> df
  id u.1t u.2 v.1 v.2
1  A    1  NA   5  NA
2  A    2  NA   4   6
3  A    1   4   5  NA
4  B   10  13  40  NA
5  B   10  12  42  NA
6  B   10  NA  41  NA

and I would like to know the id-specific means of the u.* and the v.* columns respectively like this:

> mean
  id u.mean v.mean
1  A      2      5
2  B     11     41

this is the data

df<-data.frame(id=c("A","A","A","B","B","B"),u.1t=c(1,2,1,10,10,10),u.2=c(NA,NA,4,13,12,NA),v.1=c(5,4,5,40,42,41),v.2=c(NA,6,NA,NA,NA,NA))

As is clear, by introducing NA’s, the overall mean is unequal to the mean of the row- or column-means, which is the problem here.

I thought this to be a job for by, but it turns out I can’t get by to do anything but columnwise operations?

Help is greatly appreciated–thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T22:17:19+00:00

If you want to use by, try something like this:

by(x, x$id, function(x) colMeans(x[,-1], na.rm=TRUE))

The output is a bit ugly. While you can tidy it up, I would use the plyr package:

library(plyr)
ddply(x, .(id), function(x) colMeans(x[,-1], na.rm=TRUE))

This doesn’t quite achieve what you are after, as it takes the average of each column: it doesn’t combine the u.* and v.* columns. To do that, I would melt the data first and then use plyr:

library(reshape2)
y <- melt(x)
y$variable <- gsub("\\..*", '', y$variable)
y
#   id variable value
#1   A        u     1
#2   A        u     2
#3   A        u     1
#4   B        u    10
#5   B        u    10
#6   B        u    10
#7   A        u    NA
#    (etc)

z <- ddply(y, .(id, variable), summarise, mean = mean(value, na.rm=TRUE))
z
#  id variable mean
#1  A        u    2
#2  A        v    5
#3  B        u   11
#4  B        v   41

If you choose to, you can cast this back:

dcast(z, id~variable)
#  id  u  v
#1  A  2  5
#2  B 11 41

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

My data frame looks like this: > df id u.1t u.2 v.1 v.2 1

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply