I am still attempting to create a detailed time-series dataframe. I’m attempting to get monthly data for multiple data points, then group by multiple factors. I’m not sure this is possible, as I have not seen an example close to this in the documentation, vignettes or on SO.
Here is the sample data I am trying to structure:
clients <- 1:100
dates <- seq(as.Date("2012/1/1"), as.Date("2012/9/1"), "days")
categories <- LETTERS[1:5]
products <- data.frame(clientID = sample(clients, 10000, replace = TRUE),
OrderDate = sample(dates, 10000, replace = TRUE),
category = sample(categories, 10000, replace = TRUE),
numProducts = sample(1:10, 1000, replace = TRUE),
OrderTotal = sample(1:100, 1000, replace = TRUE))
The output looks like this:
head(products)
clientID OrderDate category numProducts OrderTotal
1 90 2012-03-20 D 9 18
2 66 2012-08-19 A 3 50
3 45 2012-05-25 A 10 75
4 28 2012-01-01 D 4 27
5 71 2012-02-28 A 4 76
6 26 2012-01-28 C 8 89
The structure I am trying to get to would look something like this:
Category A ... Category E
ClientID Jan2012numProducts Jan2012OrderTotal Feb2012numProducts Feb2012OrderTotal ... Sep2012numProducts Sep2012OrderTotal
1 12 78 6 52 0 0
2 7 218 3 15 1 28
...
99999 20 192 10 100 28 156
I realize that the column names will likely get long and would look something like AJan2012numProducts or AJan2012OrderTotal, and that’s fine.
Here are the procedures I’m unclear about – Again, I can’t find them referenced in the documentation or the vignettes:
1) Can zoo aggregate for multiple observation fields? In this case, I want to get the sum of numProducts and OrderTotal at the same time, for the month. Even if zoo can’t, I could use the merge function and join on clientID and category
2) Can zoo group by a factor (or multiple factors) to perform the aggregation? I want to be able to look at clientID and category by month.
3) Is there an ability to make the dataframe with category and month along the X axis. If not, if I could get the time-series data to simply group together by clientID and category, I could then use reshape to make the time-series wide using cast. I would need to get the dataframe into this structure:
head(df)
clientID Month category numProducts OrderTotal
1 2012-01-31 A 12 78
1 2012-01-31 B 0 0
....
99999 2012-09-30 D 6 71
99999 2012-09-30 E 1 28
cast(df, month~category, sum) (or something close to that)
Is any of this possible? Could you help with some examples?
A combination of using
format.Date,xtabs, andftablegets you pretty much exactly what you ask for. I shortened the example a bit but the principle should be clear. If you wanted the month-field to be shorter you could change the name of the dimension in the table-object or you could make a month-column and redo all the work with that. (I admit I had trouble figuring out how ‘zoo’ would enter this picture. It looks like a simple tabulation problem at the moment. Although … I’m sureaggregate.zoois capable of aggregating on multiple criteria and using the sum as the aggregation function.)First the two commands, then a console session output:
Now the output:
This is the shortened example: