str = "<test>0</test>"
print re.search("<.*?>", str).group()
print re.search(">.*?<", str).group()
>> <text>
>> >0<
How can I get it so that the resulting text is “test” and “0” and not include the two characters I used as markers in the regex?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
You shouldn’t be using regex to parse XML/HTML, see murgatroid99’s comment.
That being said, here is how you can get the results you want for this example using regex. Use a capturing group:
Note that you shouldn’t use
stras a variable name, as it will mask the built-in type.An alternative to a capturing group would be a lookbehind and lookahead:
Using raw string literals (
r"...") isn’t necessary for these in particular, but it is good to get into the habit of using them when writing regular expressions to make sure that backslashes are handled properly.