I want to split a data frame based on two columns, but I want

Question

0

Asked: June 15, 20262026-06-15T11:58:50+00:00 2026-06-15T11:58:50+00:00

I want to split a data frame based on two columns, but I want

0

I want to split a data frame based on two columns, but I want the output to be a 2-D matrix of data frames, rather than a flat list of data frames. I can achieve what I want using by() and subset but I was told (I think by Ripley) that one should avoid using subset in package development. Is there an elegant alternative (perhaps using split) that preserves the dimnames?

# sample data
df <- data.frame(x=rnorm(20), y=rnorm(20), v1=rep(letters[1:5],each=4), v2=rep(LETTERS[6:9]))

# what I did previously
submat <- by(df, list(df$v1,df$v2), subset)
dim(submat) # 5 x 4
dimnames(submat) # "a" "b" "c" "d" "e" ; "F" "G" "H" "I"

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T11:58:51+00:00

To get what you ask for, a matrix of dataframes, use tapply with a function that return a particular dataframe subset but with the row names that match the factor levels.

> dfmat <- with(df, tapply(1:NROW(df), list(v1,v2), function(idx) df[idx,] ) )
> dfmat[1,1]  # items that are in a single dataframe accessed via matrix indexing
[[1]]
           x         y v1 v2
1 -0.5604756 -1.067824  a  F

> dfmat
  F      G      H      I     
a List,4 List,4 List,4 List,4
b List,4 List,4 List,4 List,4
c List,4 List,4 List,4 List,4
d List,4 List,4 List,4 List,4
e List,4 List,4 List,4 List,4

Matrices with lists as entries are print-ed to show only the object type and the number of entries (columns in this case). Notice that each entry is a list with one item, so that the dataframe attribute is maintained, but need to “drill down” to get the treasure:
Edit: added the attributes of dfmat:

>  attributes(dfmat)
$dim
[1] 5 4

$dimnames
$dimnames[[1]]
[1] "a" "b" "c" "d" "e"

$dimnames[[2]]
[1] "F" "G" "H" "I"    
#------------
> attributes( dfmat[1,1])
NULL
#------------
> attributes( dfmat[1,1][[1]])
$names
[1] "x"  "y"  "v1" "v2"

$row.names
[1] 1

$class
[1] "data.frame"

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to split a data frame based on two columns, but I want

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply