I’m trying to write a Python script that posts some JSON to a web server and gets some JSON back. I patched together a few different examples on StackOverflow, and I think I have something that’s mostly working.
import urllib2
import json
url = "http://foo.com/API.svc/SomeMethod"
payload = json.dumps( {'inputs': ['red', 'blue', 'green']} )
headers = {"Content-type": "application/json;"}
req = urllib2.Request(url, payload, headers)
f = urllib2.urlopen(req)
response = f.read()
f.close()
data = json.loads(response) # <-- Crashes
The last line throws an exception:
ValueError: No JSON object could be decoded
When I look at response, I see valid JSON, but the first few characters are a BOM:
>>> response
'\xef\xbb\xbf[\r\n {\r\n ... Valid JSON here
So, if I manually strip out the first three bytes:
data = json.loads(response[3::])
Everything works and response is turned into a dictionary.
My Question:
It seems kinda silly that json barfs when you give it a BOM. Is there anything different I can do with urllib or the json library to let it know this is a UTF8 string and to handle it as such? I don’t want to manually strip out the first 3 bytes.
You should probably yell at whoever’s running this service, because a BOM on UTF-8 text makes no sense. The BOM exists to disambiguate byte order, and UTF-8 is defined as being little-endian.
That said, ideally you should decode bytes before doing anything else with them. Luckily, Python has a codec that recognizes and removes the BOM:
utf-8-sig.So you just need: