As subset() manual states:
Warning: This is a convenience function intended for use interactively
I learned from this great article not only the secret behind this warning, but a good understanding of substitute(), match.call(), eval(), quote(), call, promise and other related R subjects, that are a little bit complicated.
Now I understand what’s the warning above for. A super-simple implementation of subset() could be as follows:
subset = function(x, condition) x[eval(substitute(condition), envir=x),]
While subset(mtcars, cyl==4) returns the table of rows in mtcars that satisfy cyl==4, enveloping subset() in another function fails:
sub = function(x, condition) subset(x, condition)
sub(mtcars, cyl == 4)
# Error in eval(expr, envir, enclos) : object 'cyl' not found
Using the original version of subset() also produces exactly the same error condition. This is due to the limitation of substitute()-eval() pair: It works fine while condition is cyl==4, but when the condition is passed through the enveloping function sub(), the condition argument of subset() will be no longer cyl==4, but the nested condition in the sub() body, and the eval() fails – it’s a bit complicated.
But does it exist any other implementation of subset() with exactly the same arguments that would be programming-safe – i.e. able to evaluate its condition while it’s called by another function?
Just because it’s such mind-bending fun (??), here is a slightly different solution that addresses a problem Hadley pointed to in comments to my accepted solution.
Hadley posted a gist demonstrating a situation in which my accepted function goes awry. The twist in that example (copied below) is that a symbol passed to
SUBSET()is defined in the body (rather than the arguments) of one of the calling functions; it thus gets captured bysubstitute()instead of the intended global variable. Confusing stuff, I know.Here is a better function that will only substitute the values of symbols found in calling functions’ argument lists. It works in all of the situations that Hadley or I have so far proposed.
IMPORTANT: Please note that this still is not (nor can it be made into) a generally useful function. There’s simply no way for the function to know which symbols you want it to use in all of the substitutions it performs as it works up the call stack. There are many situations in which users would want it to use the values of symbols assigned to within function bodies, but this function will always ignore those.