I have two sets of vectors, set A and set B. Let’s say set A contains 100 vectors and set B contains 50 vectors. I have my own way of measuring the distance between any two vectors. The objective is to map a vector in set A to that vector in set B with which the distance is within a particular threshold. Now, if the distance between two vectors is not within a particular threshold, then they are not paired. The mapping is one-one, i.e. a vector in set A can be mapped only to one vector in set B and vice-versa.
So, it may happen that finally, 40 vectors from set A are mapped to 40 vectors in set B. Thus, 60 vectors in set A are not paired with any vectors in set B. Hence, 10 vectors in set B are also left unpaired.
Now, if I label the vectors in set A as A1, A2, A3 … A100 and vectors in set B as B1, B2, B3 … an so on, what is the most efficient way of iterating through the two sets and doing this pairing.
Please let me know if it requires additional clarifications.
What you need to do is first see which vectors from A can be paired with which vectors in B. This is done with O(n^2) complexity and will create a bipartite graph – you have two partitions of vertices – the vectors in A and the vectors in B and you have an edge if and only if a vector from A can be paired with a vector from B.
After you have built the graph, you need to find maximum bipartite matching and this is usually done using a flow. Take a look here for instance. I personally use Dinitz algorithm for the flow.
Hope this helps.