I’m looking to speed up the following algorithm. I give the function an xts time series and then want to perform a principal components analysis for each time point on the previous X points (I’m using 500 at the moment) and then use the results of that PCA (5 principal components in the following code) to compute some value. Something like this:
lookback <- 500
for(i in (lookback+1):nrow(x))
{
x.now <- x[(i-lookback):i]
x.prcomp <- prcomp(x.now)
ans[i] <- (some R code on x.prcomp)
}
I assume this would require me to replicate the lookback rows as columns so that x would be something like cbind(x,lag(x),lag(x,k=2),lag(x,k=3)...lag(x,k=lookback)), and then run prcomp on each line? This seems expensive though. Perhaps some variant of apply? I’m willing to look into Rcpp but wanted to run this by you guys before that.
Edit: Wow thanks for all the responses. Info on my dataset/algorithm:
- dim(x.xts) currently = 2000×24. But eventually, if this shows promise, it will have to run fast (I’ll give it multiple datasets).
- func(x.xts) takes ~70 seconds. That’s 2000-500 prcomp calls with 1500 500×24 dataframe creations.
I attempted to use Rprof to see what was the most expensive part of the algo but it’s my first time using Rprof so I need some more experience with this tool to get intelligible results (thanks for the suggestion).
I think I will first attempt to roll this into an _apply type loop, and then look at parallelizing.
On my 4 core desktop, if this wouldn’t complete in a reasonable time-frame, I would run the chunk using something along the lines of (not tested):