s = re.sub(r"<style.*?</style>", "", s)
Isn’t this code supposed to remove styles in the s string? Why does it not work? I am trying to remove the following code:
<style type="text/css">
body { ... }
</style>
Any suggestion?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
No it’s the re.DOTALL flag that is necessary !
re.DOTALL
Make the ‘.’ special character match any character at all, including a newline; without this flag, ‘.’ will match anything except a newline.
http://docs.python.org/library/re.html#re.DOTALL
Edit
In some cases, it may be necessary to have a dot matching all characters (newlines comprised) in a region of a string, and to have a dot matching only non newlines characters in another region of the sting. But using flag re.DOTALL doesn’t allow this.
In this case, it’s usefull to know the following trick: using [\s\S] to symbolize every character
result