How to compare the similarity between two arrays? Say I have:
Base Array: [.5,0,0,0,.25,0,0,.25,0,0,0,0]
Array 1: [1,0,0,0,1,0,0,1,0,0,0,0]
Array 2: [0,0,1,0,0,0,1,0,0,1,0,0]
Array 3: [1,0,0,0,0,0,0,0,0,0,0,0]
Regarding the arrays above, the answer should be Array 1. The answer is Array 1 because, the array elements are ‘closer’ in structure to the array elements of the base array. Differing from Array 3, .25 is closer to 1 than 0. Another example:
Base Array: [.75,0,0,0,0,0,0,0,.25,0,0,0]
Array 1: [1,0,0,0,1,0,0,1,0,0,0,0]
Array 2: [0,0,1,0,0,0,1,0,0,1,0,0]
Array 3: [1,0,0,0,0,0,0,0,0,0,0,0]
Which in this case, Array 3 should then be the answer.
However, using my current algo (which I will give later), the answer becomes Array 3. Here is what I am using:
for (int i = 0; i < basearray.Length; i++)
{
temp = (basearray[i] - arrayX[i]);
dist += temp * temp;
}
So, I think there is something wrong with my algo? Or maybe, I need to use a ‘different’ kind of algorithm and not distance (since essentially, .25 IS closer to 0 than 1, but what I want is otherwise).
Thanks!
UPDATE:
I found the answer! Thanks for all those for the help. Here it is:
float[] pbaseArrX = new float[3];
float[] pcompArrX = new float[3];
float dist1 = 0, dist2 = 0;
for (int i = 0; i < baseArrX.Count; i++)
{
pbaseArrX[i] = baseArrX[i] / (baseArrX[0] + baseArrX[1] + baseArrX[2]);
}
//Do the following for both compArr1 and compArr2;
for (int i = 0; i < compArrX.Count; i++)
{
pcompArrX[i] = pcompArrX[i] / (pcompArrX[0] + pcompArrX[1] + pcompArr[2]);
}
//Get distance for both
for (int i = 0; i < pcompArrX.Count; i++)
{
distX = distX + ((pcompArrX[i] - pbaseArrX[i])^2);
}
//Then just use conditional to determine which is 'closer'
It seems like you want to compare the arrays as rays (just direction), but you’re comparing them as vectors (direction and magnitude). I’d suggest comparing the arrays with cosine similarity, which is just the cosine of the angle between the vectors and thus comparison of only their directions. For the arrays presented, the cosine similarity between the base array and array 1 is 0.94 while that with array 2 is 0.82, matching your expectations.