I need to make regex which will capture the following:
Fixed unicode text:
<br>
<strong>
text I am looking for
</strong>
I do something like
regex = re.compile(unicode('Fixed unicode text:.*','utf-8'))
How to modify that to capture remaining text?
Simply prefix
u(in Python 2.x, nothing in Python 3) to get a unicode string, and use parentheses to capture the remaining text, like this:However, it looks like your input is HTML. If that’s the case, you should not use a regular expression, but parse the HTML with lxml, BeautifulSoup, or another HTML parser.