For doing a regex substitution, there are three things that you give it:
- The match pattern
- The replacement pattern
- The original string
There are three things that the regex engine finds that are of interest to me:
- The matched string
- The replacement string
- The final processed string
When using re.sub, the final string is what’s returned. But is it possible to access the other two things, the matched string and replacement string?
Here’s an example:
orig = "This is the original string."
matchpat = "(orig.*?l)"
replacepat = "not the \\1"
final = re.sub(matchpat, replacepat, orig)
print(final)
# This is the not the original string
The match string is "original" and the replacement string is "not the original". Is there a way to get them? I’m writing a script to to search and replace in many files, and I want it to print it what it’s finding and replacing, without printing out the entire line.
Edit: as @F.J has pointed out, the above will remember only the last match/replacement. This version handles multiple occurrences: