Possible Duplicate:
Strip html from strings in python
While making a small browser like application, I am facing the problem of spliting the different tags. Consider the string
<html> <h1> good morning </h1> welcome </html>
I need the following output:
[‘good morning’,’welcome’]
How can I do that in python?
You can use one of pythons html / xml parsers.
Beautiful soup is popular. lmxl is popular too.
The above are third party pacakges you could use standard library too
http://docs.python.org/library/xml.etree.elementtree.html