I have this data.frame with equal length groups (id)
id | amount
--------------
A | 10
A | 54
A | 23
B | 34
B | 76
B | 12
which I would like to transpose by group id to this:
id |
----------------------
A | 10 | 54 | 23
B | 34 | 76 | 12
What is the most efficient way of doing this?
I’ve previously used reshape and dcast but they are very slow indeed! (I have A LOT of data and would love to speed up this bottleneck)
Is there a better strategy? Using data.table or matrices?? Any help would be much appreciated!
# Little data.frame
df <- data.frame(id=c(2,2,2,5,5,5), amount=as.integer(c(10,54,23,34,76,12)))
# Not so little data.frame
set.seed(10)
df <- data.frame(id = rep(sample(1:10000, 10000, replace=F),100), amount=as.integer(floor(runif(1000000, -100000,100000))))
# Create time variable
df$time <- ave(as.numeric(df$id), df$id, FUN = seq_along)
# The base R reshape strategy
system.time(df.reshape <-reshape(df, direction = "wide", idvar="id", timevar="time"))
user system elapsed
6.36 0.31 6.69
# The reshape2 dcast strategy
require(reshape2)
a <- system.time(mm <- melt(df,id.vars=c('id','time'),measure.vars=c('amount')))
b <- system.time(df.dcast <- dcast(mm,id~variable+time,fun.aggregate=mean))
a+b
user system elapsed
14.44 0.00 14.45
UPDATE
Using the fact that each group is equal in length you can use the matrix-function.
df.matrix <- data.frame(id=unique(df$id), matrix(df$amount, nrow=(length(unique(df$id))), byrow=T))
user system elapsed
0.03 0.00 0.03
Note: This method assumes that the data.frame is presorted by id.
This is not a problem of
reshape.aggregatefrom base should be able to handle this.Isn’t this what you wanted?
Okay, seems like an adapted version of
DWin‘s solution is the fastest. However, the result will be ordered byid. If you don’t want that, thenAditya‘s seems to be the one to use.Here are the functions and the benchmarking results:
Using
aggregate:Using
Aditya‘sUsing modified version of
Dwin‘s:Benchmarking:
Let me know if I’ve made any errors.