The great findInterval() function in R uses left-closed sub-intervals in its vec argument, as shown in its docs:
if
i <- findInterval(x,v), we havev[i[j]] <= x[j] < v[i[j] + 1]
If I want right-closed sub-intervals, what are my options? The best I’ve come up with is this:
findInterval.rightClosed <- function(x, vec, ...) {
fi <- findInterval(x, vec, ...)
fi - (x==vec[fi])
}
Another one also works:
findInterval.rightClosed2 <- function(x, vec, ...) {
length(vec) - findInterval(-x, -rev(vec), ...)
}
Here’s a little test:
x <- c(3, 6, 7, 7, 29, 37, 52)
vec <- c(2, 5, 6, 35)
findInterval(x, vec)
# [1] 1 3 3 3 3 4 4
findInterval.rightClosed(x, vec)
# [1] 1 2 3 3 3 4 4
findInterval.rightClosed2(x, vec)
# [1] 1 2 3 3 3 4 4
But I’d like to see any other solutions if there’s a better one. By “better”, I mean “somehow more satisfying” or “doesn’t feel like a kludge” or maybe even “more efficient”. =)
(Note that there’s a rightmost.closed argument to findInterval(), but it’s different – it only refers to the final sub-interval and has a different meaning.)
EDIT: Major clean-up in all aisles.
You might look at
cut. By default,cutmakes left open and right closed intervals, and that can be changed using the appropriate argument (right). To use your example:Now create four functions that should do the same thing: Two from the OP, one from Josh O’Brien, and then
cut. Two arguments tocuthave been changed from default settings:include.lowest = TRUEwill create an interval closed on both sides for the smallest (leftmost) interval.labels = FALSEwill causecutto return simply the integer values for the bins instead of creating a factor, which it otherwise does.Do all functions return the same result? Yup. (notice the use of
cutVecforcutFun)Now a more demanding vector to bin:
Test whether identical (note use of
unname)And benchmark:
From this run,
cutseems to be the fastest.