If I have some multivariate irregular time series, such as zoo or xts objects with:
> clicks
user item
2003-01-02 a i
2003-01-03 a i
2003-01-08 b i
2003-01-09 a j
2003-01-09 c j
2003-01-10 b j
> downloads
user file
2003-01-08 a f
2003-01-11 b g
2003-01-11 b f
> purchases
user
2003-01-10 a
2003-01-16 b
I can write some code to produce a simple featurization of the above data into a data frame with a row per (user, day) for all days (up to the day of the user’s first purchase), and with these columns:
- # clicks of item i in past 7 days
- # clicks of item i between 7 and 31 days ago
- # total past clicks
- same for item j
- same for downloads f, g
- whether a purchase occurs in the next 7 days
However, I’m curious whether there are easy, elegant, and not-painfully-slow ways to accomplish this using any of the various time series manipulation packages. I looked around at things in zoo and xts but I didn’t find anything promising.
You can represent each type of event (e.g., “user A clicks item i”)
as a time series
x, with value 1 each time it occurs.The quantities you are interested can be computed from
cumsum(x)(the number of events until today) and its translations
(the number of events until k days in the past or the future).
For a single time series:
For the whole dataset, you can use
ddply.