import json
import urllib
import re
import binascii
def asciirepl(match):
s = match.group()
return binascii.unhexlify(s[2:])
query = 'google'
p = urllib.urlopen('http://www.google.com/dictionary/json?callback=a&q='+query+'&sl=en&tl=en&restrict=pr,de&client=te')
page = p.read()[2:-10] #As its returned as a function call
#To replace hex characters with ascii characters
p = re.compile(r'\\x(\w{2})')
ascii_string = p.sub(asciirepl, page)
#Now decoding cleaned json response
data = json.loads(ascii_string)
Running it, I get this error,
shadyabhi@archlinux /tmp $ python2 define.py
Traceback (most recent call last):
File "define.py", line 19, in <module>
data = json.loads(ascii_string)
File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 403 (char 403)
As far as I think, the json is without any errors as I recieved it from google’s server. All, I did was removing hex characters. Any help would be highly appreciated.
Decoding the \x escapes may produce ” marks, which need to be re-escaped as they appear within “strings” encoded within the JSON data.
That still won’t handle control characters; so you might instead want to convert the \x escapes into \u escapes, which are described in the JSON standard and parsed by the
jsonmodule. This has the side benefit of being simpler 🙂