I am trying to extract semantics from graphical xy plots where the points are plotted and some or all have a label. The label is plotted “near the point” so that a human can normally understand which label goes with which point. For example in this plot it is clear which label(number) belongs to which point(*) and an algorithm based on Euclidian distance would work. (The labels and points have no semantic ordering – e.g. a scatterplot)
*1
*2
*3
*4
In congested plots the authoring software/human may place the label in different directions to avoid overlap. For example in
1**2
**4
3
A human reader can normally work out which label is associated with which label.
One solution I’d accept would be to create a Euclidean distance matrix and shuffle the rows to get the minimum of a function (e.g. the summed squares of the distances on the diagonal or other heuristic). In the second example (with the points labelled a,b,c,d clockwise from the NW corner) we have a distance matrix (to 1 d.p.)
a b c d
1ab2 1 1.0 2.0 2.2 1.4
dc4 2 2.0 1.0 1.4 2.2
3 3 2.0 2.2 1.4 1.0
4 2.2 1.4 1.0 2.0
and we need to label a1 b2 c4 d3. Swapping rows 3 and 4 gives the minimum sum of the diagonal. Here’s a more complex example where simply picking the nearest may fail
*1*2*5
**4
3 *6
If this is solved then I shall need to go to cases where the number of labels may be smaller or larger than the number of points.
If the algorithm is standard than I would appreciate a pointer to Open Source Java (e.g. JAMA or Apache maths)
NOTE: This SO answer Associating nearby points with a path doesn’t quite work as an answer because the path through the points is given.
You have a complete bipartite graph that one part is numbers and other one is points. Weight’s of edge in this graph is euclidean distance between numbers and points. And you’re task is finding matching with minimal weight.
This is known problem and has a well known algorithm named as
Hungarian Algorithm:From Wiki:
For detailed algorithm and code you can take a look at topcoder article
and this pdf maybe to use
there is a media file to describe it.
(This video explains why the Hungarian algorithm works)
For details see wiki and http://www.ams.jhu.edu/~castello/362/Handouts/hungarian.pdf