I have a large data frame (named z) that looks like this:
RPos M1
1 -0.00020
2 0.00010
3 -0.00012
4 -0.00035
5 -0.00038
...etc (about 300,000 observations)
It is essentially a time series (although it is actually a data frame, not ts or zoo).
Where RPos is the index number (explicitly stored), and M1 is any metric.
I have another data frame (named actionlist) with about 30,000 *non-consecutive observations. Each value in actionlist’s RPos column represents the last of 34 consecutive points.
My final piece of data is a single data frame (named x) of only 34 consecutive observations.
My goal is to calculate the correlation coefficients between x and each observation in actionlist (which, again, is the end-point of 34 consecutive observations).
To do this I must generate these 34-point consecutive point time series segments from z (the large data frame).
Currently, I am doing it like this:
n1<-33:0
for(i in 1:nrow(actionlist))
{
crs[i,2]<-cor(z[actionlist$RPos[i]+n1,2],x[,2])
}
When looking at the Rprof readout this is what I get:
$by.self
self.time self.pct total.time total.pct
[.data.frame 0.68 25.37 0.98 36.57
.Call 0.22 8.21 0.22 8.21
cor 0.16 5.97 2.30 85.82
...etc
It looks as though [.data.frame is taking the longest.
Specifically I am pretty sure that it is this part:
z[actionlist$RPos[i]+n1,2]
How can I speed up (eliminate the need for?) this part of the function?
I asked a similar question before, except instead of looking within a restricted list (actionlist) I was looking through every possible consecutive 34-observation within z. The answer was posted here, but I cannot figure out how to adapt it to a restricted list.
Any help would be very appreciated!
The most straightforward is probably to build
a matrix containing the data you want
to compute the correlation with, and eschew the loop altogether.