I have 2 arrays in 2D, where the column vectors are feature vectors. One

Question

0

Asked: June 5, 20262026-06-05T23:16:06+00:00 2026-06-05T23:16:06+00:00

I have 2 arrays in 2D, where the column vectors are feature vectors. One

0

I have 2 arrays in 2D, where the column vectors are feature vectors. One array is of size F x A, the other of F x B, where A << B. As an example, for A = 2 and F = 3 (B can be anything):

arr1 = np.array( [[1, 4],
                  [2, 5],
                  [3, 6]] )

arr2 = np.array( [[1, 4, 7, 10, ..],
                  [2, 5, 8, 11, ..],
                  [3, 6, 9, 12, ..]] )

I want to calculate the distance between arr1 and a fragment of arr2 that is of equal size (in this case, 3×2), for each possible fragment of arr2. The column vectors are independent of each other, so I believe I should calculate the distance between each column vector in arr1 and a collection of column vectors ranging from i to i + A from arr2 and take the sum of these distances (not sure though).

Does numpy offer an efficient way of doing this, or will I have to take slices from the second array and, using another loop, calculate the distance between each column vector in arr1 and the corresponding column vector in the slice?

Example for clarity, using the arrays stated above:

>>> magical_distance_func(arr1, arr2[:,:2])
[0, 10.3923..]
>>> # First, distance between arr2[:,:2] and arr1, which equals 0.
>>> # Second, distance between arr2[:,1:3] and arr1, which equals 
>>> diff = arr1 - np.array( [[4,7],[5,8],[6,9]] )
>>> diff
[[-3, -3], [-3, -3], [-3, -3]]
>>> # this happens to consist only of -3's. Norm of each column vector is:
>>> norm1 = np.linalg.norm([:,0])
>>> norm2 = np.linalg.norm([:,1])
>>> # would be extremely good if this worked for an arbitrary number of norms
>>> totaldist = norm1 + norm2
>>> totaldist
10.3923...

Of course, transposing the arrays is fine too, if that means that cdist can somehow be used here.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T23:16:08+00:00

If I understand your question correctly, this will work. Knowing numpy, there’s probably a better way, but this is at least fairly straightforward. I used some contrived coordinates to show that the calculation is working as expected.

>>> arr1
array([[0, 3],
       [1, 4],
       [2, 5]])
>>> arr2
array([[ 3,  6,  5,  8],
       [ 5,  8, 13, 16],
       [ 2,  5,  2,  5]])

You can subtract arr1 from arr2 by ensuring that they broadcast against each other correctly. The best way I could think of involves taking a transpose and doing some reshaping. These don’t create copies — they create views — so this isn’t so wasteful. (dist is a copy though.)

>>> dist = (arr2.T.reshape((2, 2, 3)) - arr1.T).reshape((4, 3))
>>> dist
array([[ 3,  4,  0],
       [ 3,  4,  0],
       [ 5, 12,  0],
       [ 5, 12,  0]])

Now all we have to do is apply numpy.linalg.norm across axis 1. (You can select from among several norms).

>>> numpy.apply_along_axis(numpy.linalg.norm, 1, dist)
array([  5.,   5.,  13.,  13.])

Assuming you want simple euclidean distance, you can also do it directly; not sure whether this will be faster or slower so try both:

>>> (dist ** 2).sum(axis=1) ** 0.5
array([  5.,   5.,  13.,  13.])

Based on your edit, we have to do only one small tweak. Since you want to test the columns pairwise, rather than blockwise, you need a rolling window. This can be done very simply with fairly straightforward indexing:

>>> arr2.T[numpy.array(zip(range(0, 3), range(1, 4)))]
array([[[ 3,  5,  2],
        [ 6,  8,  5]],

       [[ 6,  8,  5],
        [ 5, 13,  2]],

       [[ 5, 13,  2],
        [ 8, 16,  5]]])

Combining that with the other tricks:

>>> arr2_pairs = arr2.T[numpy.array(zip(range(0, 3), range(1, 4)))]
>>> dist = arr2_pairs - arr1.T
>>> (dist ** 2).sum(axis=2) ** 0.5
array([[  5.        ,   5.        ],
       [  9.69535971,   9.69535971],
       [ 13.        ,  13.        ]])

However, converting arrays from list comprehensions tends to be slow. It might be faster to use stride_tricks — here again, see which one suits your purposes best:

>>> as_strided(arr2.T, strides=(8, 8, 32), shape=(3, 2, 3))
array([[[ 3,  5,  2],
        [ 6,  8,  5]],

       [[ 6,  8,  5],
        [ 5, 13,  2]],

       [[ 5, 13,  2],
        [ 8, 16,  5]]])

This actually manipulates the way numpy moves over a block of memory, allowing a small array to emulate a bigger array.

>>> arr2_pairs = as_strided(arr2.T, strides=(8, 8, 32), shape=(3, 2, 3))
>>> dist = arr2_pairs - arr1.T
>>> (dist ** 2).sum(axis=2) ** 0.5
array([[  5.        ,   5.        ],
       [  9.69535971,   9.69535971],
       [ 13.        ,  13.        ]])

So now you have a simple 2-d array corresponding to distances for each pair of columns. Now it’s just a matter of getting the mean and calling argmin.

>>> normed = (dist ** 2).sum(axis=2) ** 0.5
>>> normed.mean(axis=1)
array([  5.        ,   9.69535971,  13.        ])
>>> min_window = normed.mean(axis=1).argmin()
>>> arr2[:,[min_window, min_window + 1]]
array([[3, 6],
       [5, 8],
       [2, 5]])

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have 2 arrays in 2D, where the column vectors are feature vectors. One

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply