I’d like to clean up a subtitle file that has many errors because of OCR. On of the errors is that the l is displayed as I. Of course sometimes the I is really a I, mainly in the case of:
- The beginning of a sentence:
I'm Ieaving...or- I'm Ieaving.... - In names:
IsabeIIe. - Maybe a few weird cases.
Since names are difficult to detect, I figured it would be best to replace only the I‘s with one or more directly preceding lowercase letters and check the rest manually. So after the conversion I get I'm Ieaving and Isabelle. This is the most ‘barebone’ automated solution I can think of since there are not that many words that have a lowercase letter directly preceding an uppercase letter.
How can I do this in Regex? Thanks in advance.
If your regex engine supports lookbehind, you can find all I’s preceded by a lowercase letter like this:
Otherwise, you could match both characters, and the second one will be the I.