I have a data table “the.data”, where the first column indicate a measurement instrument, and the rest different measured data.
instrument <- c(1,2,3,4,5,1,2,3,4,5)
hour <- c(1,1,1,1,1,2,2,2,2,2)
da <- c(12,14,11,14,10,19,15,16,13,11)
db <- c(21,23,22,29,28,26,24,27,26,22)
the.data <- data.frame(instrument,hour,da,db)
I also have defined groups of instruments, where for example group 1 (g1) refers to instruments 1 and 2.
g1 <- c(1,2)
g2 <- c(4,3,1)
g3 <- c(1,5,2)
g4 <- c(2,4)
g5 <- c(5,3,1,2,6)
groups <- c("g1","g2","g3","g4","g5")
I need to find out at which hour the sum of each group has maximum per data type, and its sum.
g1 hour 1: sum(da)=12+14=26
g1 hour 2: sum(da)=19+15=34
So, for g1 and da the answer is hour 2 and value 34.
I did this with a for-loop within a for-loop, but it takes too long time (I interrupted after a few hours). The issue is that the.data is about 100.000 rows long and that there are about 5.000 groups with 2-50 instruments each.
What can be a good method to do this?
Sincere thanks to all contributors to Stack-overflow.
Update: Now only five groups in examples.
/Chris
The
grouploop will have to stay, or at best be replaced by something likelapply(). Thehourloop, however, can be totally replaced by reformatting to aninstrument x hourmatrix and then just doing vectorized algebra. For example: