This doesn’t appear to be possible to me using the standard library json module. When using json.dumps it will automatically escape all non-ASCII characters then encode the string to ASCII. I can specify that it not escape non-ASCII characters, but then it crashes when it tries to convert the output to ASCII.
The problem is – I don’t want ASCII! I just want my JSON string back as a unicode (or UTF-8) string. Are there any convenient ways to do that?
Here’s an example to demonstrate what I want:
d = {'navn': 'Åge', 'stilling': 'Lærling'}
json.dumps(d, output_encoding='utf8')
# => '{"stilling": "Lærling", "navn": "Åge"}'
But of course, there is no such option as output_encoding, so here’s the actual output:
d = {'navn': 'Åge', 'stilling': 'Lærling'}
json.dumps(d)
# => '{"stilling": "L\\u00e6rling", "navn": "\\u00c5ge"}'
So to summarize – I want to convert a Python dict to an UTF-8 JSON string without any escapes. How can I do that?
I’ll accept solutions like:
- Hacks (pre- and post processing input to
dumpsto achieve the desired effect) - Subclassing the JSONEncoder (I have no idea how it works and the documentation isn’t very helpful)
- Third party libraries available on PyPi
Requirements
Make sure your python files are encoded in UTF-8. Or else your non-ascii characters will become question marks,
?. Notepad++ has excellent encoding options for this.Make sure that you have the appropriate fonts included. If you want to display Japanese characters then you need to install Japanese fonts.
Make sure that your IDE supports displaying unicode characters.
Otherwise you might get an
UnicodeEncodeErrorerror thrown.Example:
PyScripter works for me. It’s included with “Portable Python” at http://portablepython.com/wiki/PortablePython3.2.1.1
Problem
json.dumps() escapes unicode characters.
Solution
Read the update at the bottom. Or…
Replace each escaped characters with the parsed unicode character.
I created a simple lambda function called
getStringWithDecodedUnicodethat does just that.Here’s
getStringWithDecodedUnicodeas a regular function.Example
testJSONWithUnicode.py (Using PyScripter as the IDE)
Output
Update
Or… just pass
ensure_ascii=Falseas an option for json.dumps.Note: You need to meet the requirements that I outlined at the beginning or else this isn’t going to work.