I need to apply the Mann Kendall trend test in R to a big number (about 1 million) of different-sized time series. I’ve already created a script that takes the time-series (practically a list of numbers) from all the files in a certain directory and then outputs the results to a .txt file.
The problem is that I have about 1 million of time-series so creating 1 million of file isn’t exactly nice. So I thought that putting all the time-series in only one .txt file (separated by some symbol like “#” for example) could be more manageable. So I have a file like this:
1
2
4
5
4
#
2
13
34
#
...
I’m wondering, is it possible to extract such series (between two “#”) in R and then apply the analysis?
EDIT
Following @acesnap hints I’m using this code:
library(Kendall)
a=read.table("to_r.txt")
numData=1017135
for (i in 1:numData){
s1=subset(a,a$V1==i)
m=MannKendall(s1$V2)
cat(m[[1]]," ",m[[2]], " ", m[[3]]," ",m[[4]]," ", m[[5]], "\n" , file="monotonic_trend_checking.txt",append=TRUE)
}
This approach works but the problem is that it is taking ages for computation. Can you suggest a faster approach?
If you were to number the datasets as they went into the larger file it would make things easier. If you were to do that you could use a for loop and subsetting.
Then do something like: