I’m designing an algorithm to compare two objects, I’ve got a formula, but I

Question

0

Editorial Team

Asked: June 7, 20262026-06-07T08:56:32+00:00 2026-06-07T08:56:32+00:00

I’m designing an algorithm to compare two objects, I’ve got a formula, but I

0

I’m designing an algorithm to compare two objects, I’ve got a formula, but I don’t know if it’s as good as it could be.

essentialy, i’m comparing tropes between two games to say how similar they are:

$divisor = ((count($similar_concepts) - $iterator) + ($total - $iterator) + ($iterator));
echo "<BR> Value: ".($iterator / $divisor);

But, thats not readable, so here is this:

 SimilarTropes/( (OriginalTropes - SimilarTropes) + (NewTropes - SimilarTropes) + (SimilarTropes) )

I’m just not fully satisfied with the results, here’s an example:

Similarities: 47
NewTropes: 107
OriginalTropes: 156
Answer: 0.21759259259259

I don’t like these results because I feel those numbers should yeild a higher percentage of similarity.

I’d love some input here, and If i’m in the wrong place, at least some guidance on where I should go instead.

Thanks a lot!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T08:56:35+00:00

Translation to Mathematics

Let me (attempt) to translate what you have into something of a more mathematical formula. It should be easier from there.

OriginalTropes is the number of tropes from some game, call it A. Then NewTropes is tropes from some other game, call it B. Then Similarities is simply the intersection of A and B. Your formula is then:

|Intersect(A, B)| / ((|A| - |Intersect(A, B)|) + (|B| - |Intersect(A, B)|) + |Intersect(A, B)|)

Simplifying, we have:

|Intersect(A, B)| / (|A| + |B| - |Intersect(A, B)|)

In other words, you’re saying that the similarity is the ratio between the number of common items divided by the total number of items minus the number of items in common.

Now let’s take a couple of special cases. Take A = B. Then we have:

|Intersect(A, B)| = |A| = |B|. Your formula is then:

|A| / (|A| + |A| - |A|) = 1

Limitations

Let’s say now that the sets A and B are equal in size. But, they only have half of their items in common. In other words,

|A| = |B| = 2 |Intersect(A, B)|

You similarity score is then:

1/2 |A| / (2|A| - 1/2|A|) = 1/3

Ideally, this should be 1/2, not 1/3. You get something similar if you consider any sets where |A| = |B| = n and where |Intersect(A, B)| = n * p for 0 <= p <= 1.

In general, for sets of the above form you end up with your similarity algorithm underestimating the similarity between the two sets. This looks something like the purple curve in the image below. The blue curve is what cosine similarity would give. So if 50% are common and they are equal size, the two sets have a similarity of 0.5. Likewise, if they have 90% in common then it has a similarity of 0.9.

enter image description here

Cosine Similarity

What you may wish for is something similar to the angle between the two sets. Consider the total set of elements, Intersect(A, B) and define N = |Intersect(A, B)|. Let a and b be an N dimensional representation of A and B, where each element has value 1 if present in the original set or 0 if not.

Then you use the cosine of the angle as:

Cos(theta) = Dot(a, b) / (||a|| * ||b||)

Note that the notation ||a|| refers to the euclidean length, not the size of the set. This may have better properties than what you were using before.

Example

Here’s an example. Let’s say:

 A = { "Big Swords", "Male Hero", "No Cars" }
 B = { "Male Hero", "Trains", "No Dragons" }

Then the full distinct set, Union(A, B) is given as:

Union(A, B) = { "Big Swords", "Male Hero", "No Cars", "Trains", "No Dragons" }

This means that N = |Union(A, B) = 5. The tricky party becomes how to index each of these appropriately. You can actually use a dictionary plus a counter to index the elements. I’ll leave this to you to try out. For now, we’ll use the ordering of Union(A, B). Then a and b are given as:

a = { 1, 1, 1, 0, 0 }
b = { 0, 1, 0, 1, 1 ]

At this point it becomes standard mathematics:

Dot(a, b) = 1
|a| = sqrt(3)
|b| = sqrt(3)
Similarity = 1 / 3

Sample Implementation

public double Compare(IEnumerable<String> A, IEnumerable<String> B)
{
    // Form the intersection between A and B
    var C = A.Intersect(B);

    // a and b are N (C.Length) dimensional bi-valued (0 or 1) vectors
    var a = new List<int>(C.Length);
    var b = new List<int>(C.Length);

    var map = new Dictionary<String, int>();

    // Map from the original key to an index in the intersection
    for (int i = 0; i < C.Length; i++)
    {
        var key = C[i];
        map[key] = i;
    }

    // Set the 1's in the N-dimensional representation of A
    foreach (var element in A)
    {
        var i = map[element];
        a[i] = 1;
    }

    // And do the same for B
    foreach (var element in B)
    {
        var i = map[element];
        b[i] = 1;
    }

    int dot = 0;

    // Easy part :) Standard vector dot product
    for (int i = 0; i < C.Length; i++)
        dot += a[i] * b[i];

    // It suffices to take the length because the euclidean norm
    // of a and b are, respectively, the length of A and B
    return dot / Math.Sqrt((double) A.Length * B.Length);
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m designing an algorithm to compare two objects, I’ve got a formula, but I

Leave an answerCancel reply

1 Answer

Translation to Mathematics

Limitations

Cosine Similarity

Example

Sample Implementation

Leave an answer
Cancel reply