…regarding execution time and / or memory.
If this is not true, prove it with a code snippet. Note that speedup by vectorization does not count. The speedup must come from apply (tapply, sapply, …) itself.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The
applyfunctions in R don’t provide improved performance over other looping functions (e.g.for). One exception to this islapplywhich can be a little faster because it does more work in C code than in R (see this question for an example of this).But in general, the rule is that you should use an apply function for clarity, not for performance.
I would add to this that apply functions have no side effects, which is an important distinction when it comes to functional programming with R. This can be overridden by using
assignor<<-, but that can be very dangerous. Side effects also make a program harder to understand since a variable’s state depends on the history.Edit:
Just to emphasize this with a trivial example that recursively calculates the Fibonacci sequence; this could be run multiple times to get an accurate measure, but the point is that none of the methods have significantly different performance:
Edit 2:
Regarding the usage of parallel packages for R (e.g. rpvm, rmpi, snow), these do generally provide
applyfamily functions (even theforeachpackage is essentially equivalent, despite the name). Here’s a simple example of thesapplyfunction insnow:This example uses a socket cluster, for which no additional software needs to be installed; otherwise you will need something like PVM or MPI (see Tierney’s clustering page).
snowhas the following apply functions:It makes sense that
applyfunctions should be used for parallel execution since they have no side effects. When you change a variable value within aforloop, it is globally set. On the other hand, allapplyfunctions can safely be used in parallel because changes are local to the function call (unless you try to useassignor<<-, in which case you can introduce side effects). Needless to say, it’s critical to be careful about local vs. global variables, especially when dealing with parallel execution.Edit:
Here’s a trivial example to demonstrate the difference between
forand*applyso far as side effects are concerned:Note how the
dfin the parent environment is altered byforbut not*apply.