I have ’ in my HTML file (which is a right curly quote) and I want to convert it to text (if possible).
I tried using HTMLParser and BeautifulSoup but to no success.
>>> h = HTMLParser.HTMLParser()
>>> h.unescape("'")
u"'"
>>> h.unescape("’")
u'\x92' # I was hoping for a right curly quote here.
My goal is very simple: Take the html input and output all the text (without any html codes).
“right curly quote” is not an ascii character.
u'\x92'is the python representation of the unicode character representing it and not some “html code”.To display it properly in your terminal, use
print h.unescape("’").encode('utf-8')(or whatever you terminal’s charset is).