Inside a Twisted Resource, I am returning a json encoded dict as the response var below. The data is a list of 5 people with name, guid, and a couple other fields less than 32 characters long each, so not a ton of data.
I get this OverflowError exception pretty often, but I don’t quite understand what the unsupported utf-8 sequence length refers to.
self.request.write(ujson.dumps(response))
exceptions.OverflowError: Unsupported UTF-8 sequence length when
encoding string
When in doubt, check the source: http://code.google.com/p/rapidjson/source/browse/trunk/thirdparty/ultrajson/ultrajsonenc.c
This error happens when the UTF-8 length is 5 or 6 bytes. This JSON implementation doesn’t implement that. Those characters won’t work if you’re using the data in a browser anyway, since they’re outside the range of UTF-16.
I’d be surprised if this actually happened often; it’d only happen with Unicode codepoints over U+1FFFFF, which are vanishingly rare, and not even supported in Unicode strings by most builds of Python due to being outside this range. You should find out why these characters are showing up in your data.