I am having a terrible time running ‘ddply’ over two variables in what seems like it should be a simple command.
Sample data (df):
Brand Day Rev RVP
A 1 2535.00 195.00
B 1 1785.45 43.55
C 1 1730.87 32.66
A 2 920.00 230.00
B 2 248.22 48.99
C 3 16466.00 189.00
A 1 2535.00 195.00
B 3 1785.45 43.55
C 3 1730.87 32.66
A 4 920.00 230.00
B 5 248.22 48.99
C 4 16466.00 189.00
I am using the command:
df2<-ddply(df, .(Brand, Day), summarize, Rev=mean(Rev), RVP=sum(RVP))
My dataframe has about 2600 observations, and there are 45 levels of “Brand” and up to 300 levels of “Day” (which is coded using ‘difftime’).
I am able to easily use ‘ddply’ when simply grouping by “Day,” but when I also try to group by “Brand,” my computer freezes up.
Thoughts?
You should read through the help pages for
aggregate,by,ave, andtapply, paying close attention to the types of the arguments each one of them expects and the names of the arguments as well. Then run all of the examples ordemo(). The main thing @hadley did with pkg:plyr and reshape/reshape2 was to impose some degree of regularity, but it was at the expense of speed. I do understand why he did it, especially when I try to use thebase::reshapefunction, but also when I forget as I repeatedly do, which of these requires a list, which requires the FUN= argument label, which needsinteraction()for the grouping variable, …. since they are all somewhat different.