Sorry if this is a total noob question, but I wanted to try to find similar values in a list. Actually more specifically, I wanted to see if there was a way I could score the items.
I know in python I can just take one list and do a ‘==’ to see if its the same but what if they are not the exact same, but instead have somewhat similar values(or not).
Here’s an example:
#Batch one
[1, 10, 20]
[5, 15, 10]
[70, 19, 15]
[50, 40, 20]
#Batch two
[46, 19, 8]
[6, 14, 8]
[2, 11, 44]
Say I want to score/rank the two batches by how similar they are to each other. I thought I could just add all the numbers and then compare them by the total value, but I don’t think that works because [5, 6,1000] [600, 200, 211] would seem similar. In this example, [5, 15, 10] and [6, 14, 8] should get the highest score.
I thought of dividing each value and look at the percent difference but that seems really expensive if the lists get large with many variables(I may eventually have thousands of lists with over 800 variables in each) and I suspect there maybe a better approach.
Any suggestions ?
How about using the Euclidean distance?
In a list comprehension:
Or more written out: