Just had a conversation with coworkers about this, and we thought it’d be worth

Question

0

Asked: May 22, 20262026-05-22T00:52:35+00:00 2026-05-22T00:52:35+00:00

Just had a conversation with coworkers about this, and we thought it’d be worth

0

Just had a conversation with coworkers about this, and we thought it’d be worth seeing what people out in SO land had to say. Suppose I had a list with N elements, where each element was a vector of length X. Now suppose I wanted to transform that into a data.frame. As with most things in R, there are multiple ways of skinning the proverbial cat, such as as.dataframe, using the plyr package, comboing do.call with cbind, pre-allocating the DF and filling it in, and others.

The problem that was presented was what happens when either N or X (in our case it is X) becomes extremely large. Is there one cat skinning method that’s notably superior when efficiency (particularly in terms of memory) is of the essence?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T00:52:36+00:00

Since a data.frame is already a list and you know that each list element is the same length (X), the fastest thing would probably be to just update the class and row.names attributes:

set.seed(21)
n <- 1e6
x <- list(x=rnorm(n), y=rnorm(n), z=rnorm(n))
x <- c(x,x,x,x,x,x)

system.time(a <- as.data.frame(x))
system.time(b <- do.call(data.frame,x))
system.time({
  d <- x  # Skip 'c' so Joris doesn't down-vote me! ;-)
  class(d) <- "data.frame"
  rownames(d) <- 1:n
  names(d) <- make.unique(names(d))
})

identical(a, b)  # TRUE
identical(b, d)  # TRUE

Update – this is ~2x faster than creating d:

system.time({
  e <- x
  attr(e, "row.names") <- c(NA_integer_,n)
  attr(e, "class") <- "data.frame"
  attr(e, "names") <- make.names(names(e), unique=TRUE)
})

identical(d, e)  # TRUE

Update 2 – I forgot about memory consumption. The last update makes two copies of e. Using the attributes function reduces that to only one copy.

set.seed(21)
f <- list(x=rnorm(n), y=rnorm(n), z=rnorm(n))
f <- c(f,f,f,f,f,f)
tracemem(f)
system.time({  # makes 2 copies
  attr(f, "row.names") <- c(NA_integer_,n)
  attr(f, "class") <- "data.frame"
  attr(f, "names") <- make.names(names(f), unique=TRUE)
})

set.seed(21)
g <- list(x=rnorm(n), y=rnorm(n), z=rnorm(n))
g <- c(g,g,g,g,g,g)
tracemem(g)
system.time({  # only makes 1 copy
  attributes(g) <- list(row.names=c(NA_integer_,n),
    class="data.frame", names=make.names(names(g), unique=TRUE))
})

identical(f,g)  # TRUE

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Just had a conversation with coworkers about this, and we thought it’d be worth

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply