I need to get info from a website that outputs it between <font color="red">needed-info-here</font> OR <span style="font-weight:bold;">needed-info-here</span>, randomly.
I can get it when I use
start = '<font color="red">'
end = '</font>'
expression = start + '(.*?)' + end
match = re.compile(expression).search(web_source_code)
needed_info = match.group(1)
, but then I have to pick to fetch either <font> or <span>, failing, when the site uses the other tag.
How do I modify the regular expression so it would always succeed?
You can join two alternatives with a vertical bar:
since you know that a font tag will always be closed by
</font>, a span tag always by</span>.However, consider also using a solid HTML parser such as BeautifulSoup, rather than rolling your own regular expressions, to parse HTML, which is particularly unsuitable in general for getting parsed by regular expressions.