I have multiple line string that I’d like to replace, but don’t understand why it’s not working. For some reason, a period in the string stops the matching for the regular expression.
My string:
s = """
[some_previous_text]
<start>
one_period .
<end>
[some_text_after]
"""
What I’d like to end up with:
s = """
[some_previous_text]
foo
[some_text_after]
"""
What I initially tried, but it doesn’t match anything:
>>> import re
>>> s = "<start>\none_period .\n<end>"
>>> print re.sub("<start>[^.]*<end>", "foo", s)
<start>
one_period .
<end>
However, when I took the period out, it worked fine:
>>> import re
>>> s = "<start>\nno_period\n<end>"
>>> print re.sub("<start>[^.]*<end>", "foo", s)
foo
Also, when I put an <end> tag before the period, it matched the first <end> tag:
>>> import re
>>> s = "<start>\n<end>\none_period .\n<end>"
>>> print re.sub("<start>[^.]*<end>", "foo", s)
foo
one_period .
<end>
So what’s going on here? Why does the period stop the [^.]* matching?
EDIT:
SOLVED
I mistakenly thought that the carat ^ was for new-line matching. What I needed was a re.DOTALL flag (as indicated by Amber). Here’s the expression I’m now using:
>>> import re
>>> s = "<start>\none_period .\n<end>"
>>> print re.sub("<start>.*<end>", "foo", s, flags=re.DOTALL)
foo
Why wouldn’t it?
[^.]is “the set of all characters that is not a.” and thus doesn’t match periods.Perhaps you instead meant to just put
.*(any number of any characters) instead of[^.]*?For matching across newlines, specify
re.DOTALL: