I’m looking for java code (or a library) that calculates the earth mover’s distance (EMD) between two histograms. This could be directly or indirectly (e.g. using the Hungarian algorithm). I found several implementations of this in c/c++ (e.g. “Fast and Robust Earth Mover’s Distances”, but I’m wondering if there is a Java version readily available.
I will be using the EMD calculation to evaluate the approach given by this paper in the context of a science project I’m working on.
Update
Using a variety of resources I estimate that the code below should do the trick. determineMinCostAssignment is the calculation of the optimal assignment as determined by the Hungarian algorithm. For this I will be using the code from http://konstantinosnedas.com/dev/soft/munkres.htm
My main concern is the calculated flow: I am not sure if this is correct. Is there someone who can verify that this is correct or not?
/**
* Determines the Earth Mover's Distance between two histogram assuming an equal distance between two buckets of a histogram. The distance between
* two buckets is equal to the differences in the indexes of the buckets.
*
* @param threshold
* The maximum distance to use between two buckets.
*/
public static double determineEarthMoversDistance(double[] histogram1, double[] histogram2, int threshold) {
if (histogram1.length != histogram2.length)
throw new InvalidParameterException("Each histogram must have the same number of elements");
double[][] groundDistances = new double[histogram1.length][histogram2.length];
for (int i = 0; i < histogram1.length; ++i) {
for (int j = 0; j < histogram2.length; ++j) {
int abs_diff = Math.abs(i - j);
groundDistances[i][j] = Math.min(abs_diff, threshold);
}
}
int[][] assignment = determineMinCostAssignment(groundDistances);
double costSum = 0, flowSum = 0;
for (int i = 0; i < assignment.length; i++) {
double cost = groundDistances[assignment[i][0]][assignment[i][1]];
double flow = histogram2[assignment[i][1]];
costSum += cost * flow;
flowSum += flow;
}
return costSum / flowSum;
}
Here’s a pure Java port of the FastEMD algorithm, that I just released:
https://github.com/telmomenezes/JFastEMD