I have the text (code for Python 2.6):
txt="foo<br><br><b>bar :</b><br>foo<br><b>bar :</b>"
Then I tried to extract the contents of any tag (<b> tag in this example):
r=re.compile("<%s.*?>(.+?)</%s>" % ("b","b"), re.I|re.S)
This mostly works, but the output is not what I’ve expected for my tricky text:
>>>re.findall(r,txt)
['<br><b>bar :', 'foo<br><b>bar :']
Is it possible to write one regular expression to extract the text from any HTML tag in any case?
This regex pattern will get all text within the tags.
http://regexr.com?30oga