I have a JSON string I’ve parsed from a webpage, and I’m attempting to use json.loads() to turn it into a Python dictionary. However, some of the values in the JSON string contain double quotes, for example
'{"title": "The "Star Wars Kid": Where is he now?"}'
Obviously this is not a proper JSON string, and json.loads() complains. Using something like string.replace('"', '\\"') doesn’t work either, since this is a single string, and doing so would affect the double quotes that are correct as well as the bad ones.
By the way, this does not cause an HtmlXPathSelector error when scraping because on the webpage, the bad quotes are encoded like so
'{"title": "The "Star Wars Kid": Where is he now?"}'
How can I parse this string correctly with json.loads()?
EDIT: I understand that it would be simple to parse the string before the encoded quotes have been decoded (as in the second example), so I guess what I’m really asking is how to get this type of still-encoded result from a python HtmlXPathSelector.
If the HTML document I’m scraping contains this string
'{"title": "The "Star Wars Kid": Where Is He Now?"}'
How can I get HtmlXPathSelector to return that exact string without decoding the encoded quotation marks?
Here is when you want to decode the JSON. Then replace the “bad quotes” later.