I’m a bit confused. I routinely use transform like this
ddply(data.frame, 1, transform, new.column = function(old.col.1,old.col.2,...))
This is also recommended by Hadley.
But recently I asked a question and Hadley stated this:
Don’t use transform. It’s a helper function suitable for interactive use, not for programming with.
So whats wrong with transform? I think im convinced now that this is stupid:
transform(data.frame,col2=fun(col1)).
But is it not very useful in the ddply setting?
There’s a difference between using
transformwithinddplyand the functiontransform()as a standalone. It is far better (and quicker) to just do:The function combination ddply/transform is especially useful if you have more than one column to change, eg
And even then, you have the more flexible option of using
within()that allows you to use calculated results to calculate the next row:The thing with
transform()is that it is especially written to be used interactively. If you use it within a function, you might run into trouble. It is similar tosubset()in that way: They’re convenience functions, but they’re neither fast nor very safe to use within more complex code.Opinions differ on
ddply(). In some cases it works quick and gives very clean and readible code, in other cases I consider it serious overkill.ddply()often works faster and easier when you have to use non-vectorized functions, in which case the above options wouldn’t work. But for that, you also have the option to use mapply:mapply can in this case also be quite faster. To give you a basic example:
The main problem I have with
ddply()is that the order of your observations is not guaranteed, as you see in the example output below:Both functions calculate the correct result, but
mapply()does so faster in this case and with preserving the order of the observations in the dataframe.