Take for example the list (L):
John, John, John, John, Jon
We are to presume one item is to be correct (e.g. John in this case), and give a probability it is correct.
First (and good!) attempt: MostFrequentItem(L).Count / L.Count (e.g. 4/5 or 80% likelihood)
But consider the cases:
John, John, Jon, Jonny
John, John, Jon, Jon
I want to consider the likelihood of the correct item being John to be higher in the first list! I know I have to count the SecondMostFrequent Item and compare them.
Any ideas? This is really busting my brain!
Thx,
Andrew
As an extremely simple solution, compared to the more correct but complicated solutions above, you could take counts of every variation, square the counts, and use those to calculate weights. So:
would give John a weight of 4 and the other two a weight of 1, for a probability of 66% that John is correct.
would give weights of 4 to both John and Jon, so John’s probability is only 50%.