In R, is it possible to perform “immediate” logical indexing on a function’s output?
To clarify this somewhat obscure question, here’s a little everyday example the like of which I am sure many people have come across before. Suppose we have a vector “data”, like the following:
data <- c(1,1,3,5,6,6,8,10,14,15,15,20)
If we now apply the function “tabulate” to this vector, the result will be:
tabulate(data)
[1] 2 0 1 0 1 2 0 1 0 1 0 0 0 1 2 0 0 0 0 1
However, it is often desirable to access only those entries of the vector which are (in this case) non-zero, which would traditionally be done like so (I guess…):
tabulate(data)[tabulate(data) != 0]
[1] 2 1 1 2 1 1 1 2 1
However, in this case the “tabulate(data) would need to be calculated twice, which appears inefficient or even wasteful; at least, it is definitely not elegant. Likewise, storing the result of “tabulate(data) in a temporary variable can be cumbersome if one works with large datasets.
My question now simply is: does a simple, more elegant (syntactic) workaround for these kind of problems exist? Something like a “magic” direct.index function that does the job? Like so,
direct.index(tabulate.data, condition='!= 0')
…which would basically discard all values that do not meet the indexing condition already at the time of computation, making the whole process faster and more efficient.
The concrete problem with zero-removal from “tabulate” results is given here for simplicity; in fact, I’ve scratched my head about this in very many different situations. Maybe I also just have some basic misconception about R…
By the way, I’ve looked into “?subset”, but that does not seem to be what I’m looking for.
A version of the function written by hand
But, I think you are looking for the function
table