I have a dataset of 25 integer fields and 40k records, e.g.
1:
field1: 0
field2: 3
field3: 1
field4: 2
[...]
field25: 1
2:
field1: 2
field2: 1
field3: 4
field4: 0
[...]
field25: 2
etc.
I’m testing with MySQL but am not tied to it.
Given a single record, I need to retrieve the records most similar to it; something like the lowest average difference of the fields. I started looking at the following, but I don’t know how to map this onto the problem of searching for similarities in a large dataset.
Here’s a possibility with straight average distance between each of the fields (the value after each minus is from the given record needing a match):