I’ve got a dataframe dat of size 30000 x 50. I also have a

Question

0

Asked: May 28, 20262026-05-28T06:45:15+00:00 2026-05-28T06:45:15+00:00

I’ve got a dataframe dat of size 30000 x 50. I also have a

0

I’ve got a dataframe dat of size 30000 x 50. I also have a separate list that contains points to groupings of rows from this dataframe, e.g.,

rows <- list(c("34", "36", "39"), c("45", "46"))

This says that dataframe rows with rownames (not numeric row indeces, but character rownames(dat)) “34”, “36”, “39” constitute one grouping, and “45”, “46” constitute another grouping.

Now I want to pull out the groupings from the dataframe into a parallel list, but my code (below) is really, really slow. How can I speed it up?

> system.time(lapply(rows, function(r) {dat[r, ]}))
   user  system elapsed 
 246.09    0.01  247.23

That’s on a very fast computer, R 2.14.1 x64.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T06:45:16+00:00

One of the main issues is the matching of row names — the default in [.data.frame is partial matching of row names and you probably don’t want that, so you’re better off with match. To speed it up even further you can use fmatch from fastmatch if you want. This is a minor modification with some speedup:

# naive
> system.time(res1 <- lapply(rows,function(r) dat[r,]))
   user  system elapsed 
 69.207   5.545  74.787 

# match
> rn <- rownames(dat)
> system.time(res1 <- lapply(rows,function(r) dat[match(r,rn),]))
   user  system elapsed 
 36.810  10.003  47.082 

# fastmatch
> rn <- rownames(dat)
> system.time(res1 <- lapply(rows,function(r) dat[fmatch(r,rn),]))
   user  system elapsed 
 19.145   3.012  22.226

You can get further speed up by not using [ (it is slow for data frames) but splitting the data frame (using split) if your rows are non-overlapping and cover all rows (and thus you can map each row to one entry in rows).

Depending on your actual data you may be better off with matrices that have by far faster subsetting operators since they are native.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve got a dataframe dat of size 30000 x 50. I also have a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply