I want to split a data frame based on two columns, but I want the output to be a 2-D matrix of data frames, rather than a flat list of data frames. I can achieve what I want using by() and subset but I was told (I think by Ripley) that one should avoid using subset in package development. Is there an elegant alternative (perhaps using split) that preserves the dimnames?
# sample data
df <- data.frame(x=rnorm(20), y=rnorm(20), v1=rep(letters[1:5],each=4), v2=rep(LETTERS[6:9]))
# what I did previously
submat <- by(df, list(df$v1,df$v2), subset)
dim(submat) # 5 x 4
dimnames(submat) # "a" "b" "c" "d" "e" ; "F" "G" "H" "I"
To get what you ask for, a matrix of dataframes, use
tapplywith a function that return a particular dataframe subset but with the row names that match the factor levels.Matrices with lists as entries are
print-ed to show only the object type and the number of entries (columns in this case). Notice that each entry is a list with one item, so that the dataframe attribute is maintained, but need to “drill down” to get the treasure:Edit: added the attributes of dfmat: