According to the python-Levenshtein.ratio source:
https://github.com/miohtama/python-Levenshtein/blob/master/Levenshtein.c#L722
it’s computed as (lensum - ldist) / lensum. This works for
# pip install python-Levenshtein
import Levenshtein
Levenshtein.distance('ab', 'a') # returns 1
Levenshtein.ratio('ab', 'a') # returns 0.666666
However, it seems to break with
Levenshtein.distance('ab', 'ac') # returns 1
Levenshtein.ratio('ab', 'ac') # returns 0.5
I feel I must be missing something very simple.. but why not 0.75?
By looking more carefully at the C code, I found that this apparent contradiction is due to the fact that
ratiotreats the “replace” edit operation differently than the other operations (i.e. with a cost of 2), whereasdistancetreats them all the same with a cost of 1.This can be seen in the calls to the internal
levenshtein_commonfunction made withinratio_pyfunction:https://github.com/miohtama/python-Levenshtein/blob/master/Levenshtein.c#L727
and by
distance_pyfunction:https://github.com/miohtama/python-Levenshtein/blob/master/Levenshtein.c#L715
which ultimately results in different cost arguments being sent to another internal function,
lev_edit_distance, which has the following doc snippet:Code of lev_edit_distance():
[ANSWER]
So in my example,
ratio('ab', 'ac')implies a replacement operation (cost of 2), over the total length of the strings (4), hence2/4 = 0.5.That explains the “how”, I guess the only remaining aspect would be the “why”, but for the moment I’m satisfied with this understanding.