I have a double array x and a double array y. Both can have duplicates elements.
const double MAX = 10000.0;
const int X_LENGTH = 10000;
const int Y_LENGTH = 10000;
const double TOLERANCE = 0.01;
Random random = new Random();
double[] x = new double[X_LENGTH];
for(int i = 0; i < X_LENGTH; i++)
{
x[i] = MAX * random.NextDouble();
}
double[] y = new double[Y_LENGTH];
for(int j = 0; j < Y_LENGTH; j++)
{
y[j] = MAX * random.NextDouble();
}
I am trying to count how many elements in array x are found in array y within a tolerance, and how many elements in array y are found in array x within the same tolerance. Note that these numbers can be different. The simplest way to do this is with two sets of two embedded loops:
int x_matches = 0;
for(int i = 0; i < X_LENGTH; i++)
{
for(int j = 0; j < Y_LENGTH; j++)
{
if(Math.Abs(x[i] - y[j]) <= TOLERANCE)
{
x_matches++;
break;
}
}
}
int y_matches = 0;
for(int j = 0; j < Y_LENGTH; j++)
{
for(int i = 0; i < X_LENGTH; i++)
{
if(Math.Abs(x[i] - y[j]) <= TOLERANCE)
{
y_matches++;
break;
}
}
}
However, this code is run thousands of times and is the main bottleneck in the software. I am trying to speed it up. I have already optimized by sorting both arrays first and then asynchronously iterating through them.
Array.Sort(x);
Array.Sort(y);
int x_matches_2 = 0;
int i2 = 0;
int j2 = 0;
while(i2 < X_LENGTH && j2 < Y_LENGTH)
{
if(Math.Abs(x[i2] - y[j2]) <= TOLERANCE)
{
x_matches_2++;
i2++;
}
else if(x[i2] < y[j2])
{
i2++;
}
else if(x[i2] > y[j2])
{
j2++;
}
}
int y_matches_2 = 0;
int i3 = 0;
int j3 = 0;
while(i3 < X_LENGTH && j3 < Y_LENGTH)
{
if(Math.Abs(x[i3] - y[j3]) <= TOLERANCE)
{
y_matches_2++;
j3++;
}
else if(x[i3] < y[j3])
{
i3++;
}
else if(x[i3] > y[j3])
{
j3++;
}
}
I am wondering if anybody knows of a way to merge these two loops into one and still obtain the same answer. I can only come up with this:
int x_matches_2 = 0;
int y_matches_2 = 0;
bool[] y_matched = new bool[Y_LENGTH];
for(int i = 0; i < X_LENGTH; i++)
{
bool x_matched = false;
for(int j = 0; j < Y_LENGTH; j++)
{
if(Math.Abs(x[i] - y[j]) <= TOLERANCE)
{
if(!x_matched)
{
x_matches_2++;
x_matched = true;
}
if(!y_matched[j])
{
y_matches_2++;
y_matched[j] = true;
}
}
}
}
It doesn’t require sorting; however, it ends up being slower because more comparisons must be done.
P.S. This is an oversimplification of my actual problem, but I think the solution to this will apply to both.
It is possible to have a single loop, but you will process some part of each array more than once.
Some odd results you’ll get : if you have 2 arrays of 2 elements each, it’s possible that you have more than 2 match from x to y or y to x. It happens cause x[0] can match with y[0] ans y[1], so can x[1]. This way, you’d end up with 4 match in both “direction”. For example, when I ran this code with 2 arrays of 1000 items each, I had 1048 matches in one, and 978 in the other. I hope it helps.
Edit: Here is a generic version :
With an example of how you’d call it for int :