I see that Pandas does not allow duplicate time series indexes yet (https://github.com/pydata/pandas/issues/643), but will be added soon. I am wondering if there is a good way to apply rolling window means to a dataset with duplicate times by a multi-index tag/column
Basically I have a csv of non-ordered events that consist of epochtime, hierarchical tags (tag1, tag2), and time taken. A small sample:
epochTimeMS,event,tag,timeTakenMS
1331782842801,event1,tag1,16
1331782841535,event1,tag2,1278
1331782842801,event1,tag1,17
1331782842381,event2,tag1,436
What I want to do is build and graph rolling means with varying ms windows, by event and event+tag. This seems like it should be accomplished in Pandas, but not sure if I will need to wait until the duplicate time-series indexes first. Any thoughts on hacking this in place now?
There’s nothing really to stop you right now:
Accessing specific values by timestamp will cause an exception (this is going to be improved, as you mention), but you can certainly work with the data. Now, if you want a fixed-length (in time space) window, that’s not supported very well yet but I created an issue here:
https://github.com/pydata/pandas/issues/936
If you could speak up on the mailing list about your API requirements in your application it would be helpful for me and the guys since we’re actively working on the time series capabilities right now.