I’m basically looking for a way to do a variation of this Ruby script in R.
I have an arbitrary list of numbers (steps of a moderator for a regression plot in this case) which have unequal distances from each other, and I’d like to round values which are within a range around these numbers to the nearest number in the list.
The ranges don’t overlap.
arbitrary.numbers <- c(4,10,15) / 10
numbers <- c(16:1 / 10, 0.39, 1.45)
range <- 0.1
Expected output:
numbers
## 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.39 1.45
round_to_nearest_neighbour_in_range(numbers,arbitrary.numbers,range)
## 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
I’ve got a little helper function that might do for my specific problem, but it’s not very flexible and it contains a loop. I can post it here, but I think a real solution would look completely different.
The different answers timed for speed (on a million numbers)
> numbers = rep(numbers,length.out = 1000000)
> system.time({ mvg.round(numbers,arbitrary.numbers,range) })[3]
elapsed
0.067
> system.time({ rinker.loop.round(numbers,arbitrary.numbers,range) })[3]
elapsed
0.289
> system.time({ rinker.round(numbers,arbitrary.numbers,range) })[3]
elapsed
1.403
> system.time({ nograpes.round(numbers,arbitrary.numbers,range) })[3]
elapsed
1.971
> system.time({ january.round(numbers,arbitrary.numbers,range) })[3]
elapsed
16.12
> system.time({ shariff.round(numbers,arbitrary.numbers,range) })[3]
elapsed
15.833
> system.time({ mplourde.round(numbers,arbitrary.numbers,range) })[3]
elapsed
9.613
> system.time({ kohske.round(numbers,arbitrary.numbers,range) })[3]
elapsed
26.274
MvG’s function is the fastest, about 5 times faster than Tyler Rinker’s second function.
Yet another solution using
findInterval:The
nearestin the above code is not really mathematically the nearest number. Instead, it is the largest arbitrary number such thatnearest[i] - range <= numbers[i], or equivalentlynearest[i] <= numbers[i] + range. So in one go we find the largest arbitrary number which is either in the snapping range for a given input number, or still too small for that. For this reason, we only need to check one way forsnap. No absolute value required, and even the squaring from a previous revision of this post was unneccessary.Thanks to Interval search on a data frame for the pointer at
findInterval, as I found it there before recognizing it in the answer by nograpes.If, in contrast to your original question, you had overlapping ranges, you could write things like this:
In this code,
nearestreally ends up being the nearest number. This is achieved by considering both endpoints of every interval. In spirit, this is very much like the version by nograpes, but it avoids usingifelseandNA, which should benefit performance as it reduces the number of branching instructions.