I’ve always found character sets and encodings complicated to understand and here I’m faced with another problem. My apologies for any inaccuracies. I’ll do my best.
I’m requesting data from a server which returns JSON. In the HTTP headers it also returns the character set like so:
Content-Type: text/html; charset=UTF-8
I’m using the JSON library in Python to load the JSON using the json.loads method. When I pass it the returned JSON, it gives me a dictionary in Unicode. I’ve Googled around and I know that JSON should return Unicode as JavaScript strings are Unicode objects. How can I load the JSON as UTF-8? I would like to use the same encoding as specified in the response header.
I’ve read this post but it didn’t help.
Thank you.
json.loadsautomatically handlesstrs that are passed to it in UTF-8, so, in this specific case, you shouldn’t have to worry about charsets yourself.loadsis already converting from UTF-8 to Python’s UCS-2 Unicode representation for you.Unless you have some other reason why you really need to operate on the original UTF-8, you should feel fine, even though you’re passing in a
strand getting backunicodes. You can also specify the input encoding as the second parameter toloadsif you want to be sure or if you’re dealing with varying charsets.