I am trying to correct a text that has some very typical scanning errors

Question

0

Asked: June 1, 20262026-06-01T04:55:01+00:00 2026-06-01T04:55:01+00:00

I am trying to correct a text that has some very typical scanning errors

0

I am trying to correct a text that has some very typical scanning errors (l mistaken for I and vice-versa). Basically I would like to have the replacement string in re.sub to depend on the number of times the ‘I’ is detected, something like that:

re.sub("(\w+)(I+)(\w*)", "\g<1>l+\g<3>", "I am stiII here.")

What’s the best way to achieve this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T04:55:02+00:00

Pass a function as the replacement string, as described in the docs. Your function can identify the mistake and create the best substitution based on that.

def replacement(match):
    if "I" in match.group(2):
        return match.group(1) + "l" * len(match.group(2)) + match.group(3)
    # Add additional cases here and as ORs in your regex

re.sub(r"(\w+)(II+)(\w*)", replacement, "I am stiII here.")
>>> I am still here.

(note that I modified your regex so the repeated Is would appear in one group.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to correct a text that has some very typical scanning errors

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply