I know that loops are slow in R and that I should try to do things in a vectorised manner instead.
But, why? Why are loops slow and apply is fast? apply calls several sub-functions — that doesn’t seem fast.
Update: I’m sorry, the question was ill-posed. I was confusing vectorisation with apply. My question should have been,
“Why is vectorisation faster?”
Loops in R are slow for the same reason any interpreted language is slow: every
operation carries around a lot of extra baggage.
Look at
R_execClosureineval.c(this is the function called to call auser-defined function). It’s nearly 100 lines long and performs all sorts of
operations — creating an environment for execution, assigning arguments into
the environment, etc.
Think how much less happens when you call a function in C (push args on to
stack, jump, pop args).
So that is why you get timings like these (as joran pointed out in the comment,
it’s not actually
applythat’s being fast; it’s the internal C loop inmeanthat’s being fast.
applyis just regular old R code):Using a loop: 0.342 seconds:
Using sum: unmeasurably small:
It’s a little disconcerting because, asymptotically, the loop is just as good
as
sum; there’s no practical reason it should be slow; it’s just doing moreextra work each iteration.
So consider:
(That example was discovered by Radford Neal)
Because
(in R is an operator, and actually requires a name lookup every time you use it:Or, in general, interpreted operations (in any language) have more steps. Of course, those steps provide benefits as well: you couldn’t do that
(trick in C.