I have:
- Correct numerical ID such as Phone number / Social-security number / etc.
- Another number, from some data-entry form
The 2nd number is similar, but not equal to the 1st number.
Both numbers are valid.
I want to calculate how probable it is that the 2nd number is actually a typing error of the 1st number.
Such errors may include:
- Off by a few digits
- Transposed digits
- Misinterpreted digits (1-7, 4-9, 3-8, 2-5)
Does anyone know about existance of such algorithm / code?
Edit:
I’m not looking for a general string-similarity algorithm. I’m looking for an algorithm optimized for human number-entry typing errors, or for some research about this topic.
There are several algorithms to measure a string similarity.
You could implement some variant of the Levenshtein distance or Damerau-Levenshtein distance that rates the types of errors differently.