I am trying to optimize the following code.
dim <- c(10000,100)
m <- matrix(sample(0:10, prod(dim), replace = TRUE), nrow = dim[1], ncol = dim[2])
system.time({
output <- matrix(0, nrow = dim[1], ncol = dim[2])
for (i in 1:dim[1]){
output[i,1] <- m[i,1]
for (j in 2:dim[2]){
output[i,j] <- output[i, j-1] * 0.5 + m[i,j]
}
}
})
Conceptually, it is quite similar to a simple cumulative sum:
system.time({
output <- matrix(0, nrow = dim[1], ncol = dim[2])
for (i in 1:dim[1]){
output[i,] <- cumsum(m[i,])
}
})
The problem is, the first part of the code is about 100 times slower. Is there any way to build a customized version of cumsum() that would do the trick ?
Your case is exactly the same as generating a AR(1) model with coefficient 0.5. You can use the
filterfunction to generate the data.filteralso support higher order recursion, convolution or mixture of them(think about the ARMA model). You may have a look ofconvolvefor other convolutions. Also, you could compiler your code to speed up the loop. In my code, complied loop and uncompiled loop code is about 111 and 162 times slower than filter respectively.output: