I am starting with 3 large data tables (named A1,A2,A3). Each table has 4 data columns (V1-V4), 1 “Date” column that is constant across all three tables, and thousands of rows.
Here is some dummy data that approximates my tables.
A1.V1<-c(1,2,3,4)
A1.V2<-c(2,4,6,8)
A1.V3<-c(1,3,5,7)
A1.V4<-c(1,2,3,4)
A2.V1<-c(1,2,3,4)
A2.V2<-c(2,4,6,8)
A2.V3<-c(1,3,5,7)
A2.V4<-c(1,2,3,4)
A3.V1<-c(1,2,3,4)
A3.V2<-c(2,4,6,8)
A3.V3<-c(1,3,5,7)
A3.V4<-c(1,2,3,4)
Date<-c(2001,2002,2003,2004)
DF<-data.frame(Date, A1.V1,A1.V2,A1.V3,A1.V4,A2.V1,A2.V2,A2.V3,A2.V4,A3.V1,A3.V2,A3.V3,A3.V4)
So this is what my data frame ends up looking like:
Date A1.V1 A1.V2 A1.V3 A1.V4 A2.V1 A2.V2 A2.V3 A2.V4 A3.V1 A3.V2 A3.V3 A3.V4
1 2001 1 2 1 1 1 2 1 1 1 2 1 1
2 2002 2 4 3 2 2 4 3 2 2 4 3 2
3 2003 3 6 5 3 3 6 5 3 3 6 5 3
4 2004 4 8 7 4 4 8 7 4 4 8 7 4
My goal is to calculate the row mean for each of the matching columns from each data table. So in this instance, I would want row means for all columns ending in V1, all columns ending in V2, all columns ending in V3 and all columns ending in V4.
The end result would look like this
V1 V2 V3 V4
2001 1 2 1 1
2002 2 4 3 2
2003 3 6 5 3
2004 4 8 7 4
So my question is, how to I go about calculating row means based on a partial match in the column name?
Thanks
I’m sure it can be done more elegantly, but this is one possibility that seems to work.
I should also describe, what I did.
First, I declared the column names to be partially matched.
Then, using the
grepcommand to partially select the columns in your data frame (that matched the particular substring). Theapplycommand calculates the means andlapplydoes it for all columns partially matched by the substring.Using
do.callandcbind(as suggested by DWin), we concatenate individual columns.Finally, we set the column names from the
Datecolumn of the original data frame.The problem can be solved more elgantly and efficiently, see solutions by DWin and Maiasaura.