I have a time series in the following format:
time data value
733408.33 x1
733409.21 x2
733409.56 x3
etc..
The data runs from approximately 01-Jan-2008 to 31-Dec-2010.
I want to separate the data into columns of monthly length.
For example the first column (January 2008) will comprise of the corresponding data values:
(first 01-Jan-2008 data value):(data value immediately preceding the first 01-Feb-2008 value)
Then the second column (February 2008):
(first 01-Feb-2008 data value):(data value immediately preceding the first 01-Mar-2008 value)
et cetera…
Some ideas I’ve been thinking of but don’t know how to put together:
- Convert all serial time numbers (e.g. 733408.33) to character strings with
datestr - Use
strmatch('01-January-2008',DatesInChars)to find the indices of the rows corresponding to 01-January-2008 - Tricky part (?):
TransformedData(:,i) = OriginalData(start:end)?end = strmatch(1) - 1andstart = 1. Then changestartat the end of the loop tostrmatch(1)and then run step 2 again to find the next “starting index” and changeendto the “new”strmatch(1)-1?
Having it speed optimized would be nice; I am going to apply it on data sampled ~2 million times.
Thanks!
I would use
histcwith a list a list of last days of the month as the second parameter (Note: usehistcwith the two return functions).The edge list can easily be created with
datenumordatevec.This way you don’t have operation on string and you that should be fast.
EDIT:
Example with result in a simple data structure (including some code from @Rody):