If you run:
mod <- lm(mpg ~ factor(cyl), data=mtcars)
It runs, because lm knows to look in mtcars to find both mpg and cyl.
Yet mean(mpg) fails as it can’t find mpg, so you do mean(mtcars$mpg).
How do you code a function so that it knows to look in ‘data’ for the variables?
myfun <- function (a,b,data){
return(a+b)
}
This will work with:
myfun(mtcars$mpg, mtcars$hp)
but will fail with:
myfun(mpg,hp, data=mtcars )
Cheers
Here’s how I would code
myfun():If you’re familiar with
with(), it’s interesting to see that it works in almost exactly the same way:In both cases, the key idea is to first create an expression from the symbols passed in as arguments and then evaluate that expression using
dataas the ‘environment’ of the evaluation.The first part (e.g. turning
a + binto the expressionmpg + hp) is possible thanks tosubstitute(). The second part is possible becauseeval()was beautifully designed, such that it can take adata.frameas its evaluation environment.