Code:
str = '<br><br />A<br />B'
print(re.sub(r'<br.*?>\w$', '', str))
It is expected to return <br><br />A, but it returns an empty string ''!
Any suggestion?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Greediness works from left to right, but not otherwise. It basically means “don’t match unless you failed to match”. Here’s what’s going on:
<brat the start of the string..*?is ignored for now, it is lazy.>, and succeeds.\wand fails. Now it’s interesting – the engine starts backtracking, and sees the.*?rule. In this case,.can match the first>, so there’s still hope for that match.>\wcan match, but$fails. Again, the engine comes back to the lazy.*rule, and keeps matching, until it matches<br><br />A<br />BLuckily, there’s an easy solution: By replacing
<br[^>]*>\w$you don’t allow matching outside of your tags, so it should replace the last occurrence.Strictly speaking, this doesn’t work well for HTML, because tag attributes can contain
>characters, but I assume it’s just an example.