I’m using Python to write a regular expression for replacing parts of the string with a XML node.
The source string looks like:
Hello REPLACE(str1) this is to replace REPLACE(str2) this is to replace
And the result string should be like:
Hello <replace name="str1"> this is to replace </replace> <replace name="str2"> this is to replace </replace>
Can anyone help me?
What makes your problem a little bit tricky is that you want to match inside of a multiline string. You need to use the
re.MULTILINEflag to make that work.Then, you need to match some groups inside your source string, and use those groups in the final output. Here is code that works to solve your problem:
The only tricky part is the regular expression pattern. Let’s look at it in detail.
^matches the start of a string. Withre.MULTILINE, this matches the start of a line within a multiline string; in other words, it matches right after a newline in the string.\s*matches optional whitespace.REPLACEmatches the literal string “REPLACE”.\(matches the literal string “(“.(begins a “match group”.[^)]means “match any character but a “)”.+means “match one or more of the preceding pattern.)closes a “match group”.\)matches the literal string “)”(.*)is another match group containing “.*”.$matches the end of a string. Withre.MULTILINE, this matches the end of a line within a multiline string; in other words, it matches a newline character in the string..matches any character, and*means to match zero or more of the preceding pattern. Thus.*matches anything, up to the end of the line.So, our pattern has two “match groups”. When you run
re.sub()it will make a “match object” which will be passed tomksub(). The match object has a method,.groups(), that returns the matched substrings as a tuple, and that gets substituted in to make the replacement text.EDIT: You actually don’t need to use a replacement function. You can put the special string
\1inside the replacement text, and it will be replaced by the contents of match group 1. (Match groups count from 1; the special match group 0 corresponds the the entire string matched by the pattern.) The only tricky part of the\1string is that\is special in strings. In a normal string, to get a\, you need to put two backslashes in a row, like so:"\\1"But you can use a Python “raw string” to conveniently write the replacement pattern. Doing so you get this:import re