I came across a confusing “feature” of subset function (using column name as a vector name for subsetting does not work):
data(iris)
Species <- unique(iris$Species)
i <- 2
Species[i]
subset(iris, subset = Species == Species[i])
sp <- unique(iris$Species)
sp[i]
subset(iris, subset = Species == sp[i])
Could someone explain me, what happens here and why?
subset()will first look inside the dataframe for any object you mention, so in your first exampleSpecies[i]returns ‘setosa’ (the same asiris$Species[i]). Only when the object you specify cannot be found inside the data frame, R looks in the parent frames and will find the correct object there.So it all does work, you just don’t understand how it works. You could have read this in the help files :
How does this come about?
The reason is the following lines of code in
subset():subset(ore) is in your exampleSpecies == Species[i]xis in your exampleirisparent.frame()returns in your example the global environment.The second argument of the call to
eval,xis calledenvir. It is the environment (or list or data frame, …) where the expression is evaluated. In your case, R evaluatesSpecies == Species[i]insidex, which is your data frame.The third argument,
parent.frame(), is the enclosure. This is the environments that encloses the data frame you specified als environment, and is the place where R will look in case the variables aren’t found in the dataframe.See also ?
eval