What’s the easiest way of decoding a string such that:
'Bayern München' -> 'Bayern München'
I’m looking for something lightweight, perhaps a string replace will be good enough although a more robust solution would make me happier. I was hoping that the encode and decode methods would be of help but I’ve had no luck so far.
For context I’m scraping a small amount of information from a web page, I don’t want a heavyweight solution (Had looked at scrapy but whilst great it’s way too much for me). The page reports a utf-8 encoding but I don’t know how to go from that to a string with an umlaut that I can print to the user.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I have tried to research this but none of the other answers on SO or beyond have helped me. Beautiful Soup doesn’t handle these hex codes for example.
This is my first real issue with encodings so sorry if I’ve opened a can of worms, please bear with me.
Looks like this will work in Python 2.6 or later:
Technically this is “internal” and undocumented, but it’s been in the API quite a while and isn’t marked with a leading underscore.
Found it here; other approaches are also mentioned, of which BeautifulSoup is probably the best if you don’t mind its “heaviness.”