Other posts suggested that ddply is a good workhorse.
I am trying to learn xxply functions and I can not solve this problem.
This is my
library(ggplot2)
(df= tips[1:5,])
total_bill tip sex smoker day time size
1 16.989999999999998437 1.0100000000000000089 Female No Sun Dinner 2
2 10.339999999999999858 1.6599999999999999201 Male No Sun Dinner 3
3 21.010000000000001563 3.5000000000000000000 Male No Sun Dinner 3
4 23.679999999999999716 3.3100000000000000533 Male No Sun Dinner 2
5 24.589999999999999858 3.6099999999999998757 Female No Sun Dinner 4
and I need to something like this
ddply(df
,.(<do I have to enumerate all columns I need to operate on here?)>
, function(x) {if size>=3 return(size) else return(total_bill+tip)
)
(the example is a fake problem (does not make real life sense) and only demonstrates my problem with larger data)
-
I could not get the ddply code right reading just help files. Any advise appreciated. Or even great ddply tutorial?
-
I like that with ddply I can just pass my dataframe as input, but in the second argument, it is not nice that I am forced to enumerate all columns that I need later. Is there a way to pass the whole row (all columns)?
-
I like defining the function on the fly, but I am not sure how to make my pseudocode correct in R (my last argument).
Based on your code, it doesn’t look like you need to use plyr here at all. It seems to me you are calculating a new variable for each row of the data.frame. If that’s the case, then just use some base R functions:
Sorry if I misunderstood what you are doing. If you do in fact need to pass the entire row of a data.frame into plyr with no grouping variable, perhaps you can treat it as an array with margin = 1? i.e
adply(dat, 1, ...)Great introduction of plyr here: http://www.jstatsoft.org/v40/i01/paper