I have a long numerical time series data of approximately 200,000 rows (lets call it Z).
In a loop, I subset x (about 30) consecutive rows from Z at a time and treat them as the query point q.
I want to locate within Z the y (~300) most correlated time series segments of length x (most correlated with q).
What is an efficient way to accomplish this?
The code below finds the 300 segments you are looking for and runs in 8 seconds on my none too powerful Windows laptop, so it should be fast enough for your purposes.
First, it constructs a 30-by-199971 matrix (
Zmat), whose columns contain all of the length-30 “time series segments” you want to examine. A single call tocor(), operating on the vectorqand the matrixZmat, then calculates all of the desired correlation coefficients. Finally, the resultant vector is examined to identify the 300 sequences having the highest correlation coefficients.