For instance, let’s say you have ~10 years of daily 1 min data for the volume of instrument x as follows (in xts format) from 9:30am to 4:30pm :
Date.Time Volume
2001-01-01 09:30:00 1200
2001-01-01 09:31:00 1110
2001-01-01 09:32:00 1303
All the way through to:
2010-12-20 16:28:00 3200
2010-12-20 16:29:00 4210
2010-12-20 16:30:00 8303
I would like to:
- Get the average volume at each minute for the entire series (ie average volume over all 10 years at 9:30, 9:31, 9:32…16:28, 16:29, 16:30)
How should I best go about:
- Aggregating the data into one minute buckets
- Getting the average of those buckets
- Reconstituting those “average” buckets back to a single xts/zoo time series?
I’ve had a good poke around with aggregate, sapply, period.apply functions etc, but just cannot seem to “bin” the data correctly.
It’s easy enough to solve this with a loop, but very slow. I’d prefer to avoid a programmatic solution and use a function that takes advantage of C++ architecture (ie xts based solution)
Can anyone offer some advice / a solution?
Thanks so much in advance.
First lets create some test data:
1) aggregate.zoo. Now try converting it to
timesclass and aggregating using this one-liner:1a) aggregate.zoo (variation). or this variation which converts the shorter aggregate series to
timesto avoid having to do it on the longer original series:2) tapply. An alternative would be
tapplywhich is likely faster:EDIT: simplified (1) and added (1a) and (2)