from mechanize import Browser
br = Browser()
br.open('http://somewebpage')
html = br.response().readlines()
for line in html:
print line
When printing a line in an HTML file, I’m trying to find a way to only show the contents of each HTML element and not the formatting itself. If it finds '<a href="whatever.example">some text</a>', it will only print ‘some text’, '<b>hello</b>' prints ‘hello’, etc. How would one go about doing this?
I always used this function to strip HTML tags, as it requires only the Python stdlib:
For Python 3:
For Python 2: