I am trying to correct a text that has some very typical scanning errors (l mistaken for I and vice-versa). Basically I would like to have the replacement string in re.sub to depend on the number of times the ‘I’ is detected, something like that:
re.sub("(\w+)(I+)(\w*)", "\g<1>l+\g<3>", "I am stiII here.")
What’s the best way to achieve this?
Pass a function as the replacement string, as described in the docs. Your function can identify the mistake and create the best substitution based on that.
(note that I modified your regex so the repeated Is would appear in one group.)