I have two arrays, which can look like this:
X = np.array([ 157, 262, 368, 472, 577, 682, 786, 891, 996, 1100, 1204,
1310, 1415, 1520, 1625, 1731, 1879])
Y = np.array([ 30, 135, 240, 345, 450, 555, 660, 765, 870, 975, 1080,
1185, 1290, 1395, 1500, 1605])
The arrays will:
- Have values sorted in ascending order from start.
- Be of unequal length at times.
I want to interleave these two into a new array Z based on the following:
- Each element may only be used once
- All elements need not be used
- An element
Ximay only be included inZif there is an elementYjinYsuch that there are no other elements inYwith value difference smaller thanabs(Xi - Yj)and that there are no element inXfor which the value distance toYjis smaller thanabs(Xi - Yj). (The same rule applies to elements inY.)
I see that I can do this with a bunch of nested for loops, but I wonder if there is some smarter, neater way of doing this?
(I realize, the way I put the question, that it sounds like cut from a textbook. It is not. But maybe it is a classic sort function, who knows, but for me as a biologist… all I can say is I’m at a loss as how to solve it in an efficient, neat way.)
Edit: Not so pretty example
new_list = list()
for i in X:
delta_i = np.abs(Y - i)
delta_reciprocal = np.abs(X - Y[delta_i.argmin()])
if delta_i.min() == delta_reciprocal.min():
new_list += sorted([Y[delta_i.argmin()],
X[delta_reciprocal.argmin()]])
Z = np.array(new_list)
I’m not even totally sure it fulfills all the criteria, but when rewriting the old code I got down to just one loop… still there must be some nicer way!
Let’s try to work out the solution for this example:
We can compute all the distances between values in
Xand values inYlike this:The rows correspond to
Xvalues, the columns correspond toYvalues.To find the
Xvalues which are closest to some element inY, we are lookingfor the
Xwhich corresponds to a minimum value in a column of thedistmatrix. Each column corresponds to a particular
Y, so the minimum distance ina column corresponds to the minimum between some
Xand a particularY.Visually speaking, what we are looking for are values in
distwhich areminimums for both the row that they are in, and the column that they are
in. Let’s call them “row-column minimums”.
In the
distarray above, 40 is a row-column minimum. 65 is a column-minimum,but not a row-column minimum.
For each column, we can find the X-index which minimizes the column this way:
Similarly, for each row, we can find the Y-index this way:
Now, let’s forget about this example for a second and suppose
idx1looked like this:This is saying in the 5th column, row 2 has the minimum value.
Then if row 2, column 5 were to correspond to a row-column minimum, then
idx2would have to look like this:
We can express this relationship in NumPy with
So the X, Y values which correspond to row-column minimums are
and
Timeit results:
So
alt_find_closeis significantly faster thanfind_close.