I have a very simple json I can’t parse with simplejson module.
Reproduction:
import simplejson as json
json.loads(r'{"translatedatt1":"Vari\351es"}')
Result:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.5/simplejson/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib/pymodules/python2.5/simplejson/decoder.py", line 335, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/pymodules/python2.5/simplejson/decoder.py", line 351, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid \escape: line 1 column 23 (char 23)
Anyone has an idea what’s wrong and how to parse the json above correctly?
The string that is encoded there is: Variées
P.S. I use python 2.5
Thanks a lot!
That would be quite correct;
Vari\351escontains an invalid escape, the JSON standard does not allow for a\followed by just numbers.Whatever produced that code should be fixed. If that is impossible, you’ll need to use a regular expression to either remove those escapes, or replace them with valid escapes.
If we interpret the
351number as an octal number, that would point to the unicode code point U+00E9, theécharacter (LATIN SMALL LETTER E WITH ACUTE). You can ‘repair’ your JSON input with:Using
repair()your example can be loaded:You may need to adjust the interpretation of the codepoints; I choose octal (because
Variéesis an actual word), but you need to test this more with other codepoints.