I have two floating arrays and want to find data points which match within a certain range.
This is what I got so far:
import numpy as np
for vx in range(len(arr1)):
match = (np.abs(arr2-arr1[vx])).argmin()
if abs(arr1[vx]-arr2[match])<0.375:
point = arr2[match]
The problem is that arr1 contains 150000 elements and arr2 around 110000 elements. This takes an awful amount of time. Do you have suggestions to speed things up?
In addition to not being vectorized, your current search is (n * m) where n is the size of arr2 and m is the size of arr1. In these kinds of searches it helps to sort arr1 or arr2 so you can use a binary search. Sorting ends up being the slowest step but it’s still faster if m is large because the n*log(n) sort is faster than (n*m).
Here is how you can do the search in a vectorized way using the sorted array: