I just discovered the great plyr package and am taking it for a spin.
A question I have is the following: is there some way to access the grouping variables from within d_ply?
Say I have a dataframe df with columns x,y,z, and I would like to plot for each z the values x versus y. If I do the following:
plotxy = function(df, ...) {plot(df$x, df$y, ...)}
d_ply(df, .(z), plotxy(df, main=.(z)))
then the titles that show up on the plots are all “z”, and not the values of the z variable. Is there a way to access those values from within d_ply?
EDIT: As @Justin pointed out, the above formulation is wrong because I am passing the whole of df to plotxy. Hence the line
d_ply(df, .(z), plotxy(df, main=.(z)))
should be
d_ply(df, .(z), plotxy, main=.(z))
in order to make sense in terms of my original question (I guess that’s also what @joran was hinting at).
However, I realized something else. Even though df gets sliced along z by d_ply, the sub-dataframe that the function receives still has a z column — simply with always the same value. Hence the problem can apparently be solved as follows:
plotxy = function(df, ...) {plot(df$x, df$y, main=df$z[1])}
d_ply(df, .(z), plotxy)
By way of example, I’ll expand on Joran’s concern.
lets use your function and see what we get without plyr:
versus the maybe more expected(?):
However, in you code, you are splitting your data frame on z then sending the whole data.frame df to your function again. Instead you could make a wrapper function:
This way the
plotxyfunction is only seeing the smaller split data.frame ply.df that you pass through the wrapper function.