I am using python2.7 and lxml. My code is as below
import urllib
from lxml import html
def get_value(el):
return get_text(el, 'value') or el.text_content()
response = urllib.urlopen('http://www.edmunds.com/dealerships/Texas/Frisco/DavidMcDavidHondaofFrisco/fullsales-504210667.html').read()
dom = html.fromstring(response)
try:
description = get_value(dom.xpath("//div[@class='description item vcard']")[0].xpath(".//p[@class='sales-review-paragraph loose-spacing']")[0])
except IndexError, e:
description = ''
The code crashes inside the try, giving an error
UnicodeDecodeError at /
'utf8' codec can't decode byte 0x92 in position 85: invalid start byte
The string that could not be encoded/decoded was: ouldn�t be
I have tried using a lot of techniques including .encode(‘utf8’), but none does solve the problem. I have 2 question:
- How to solve this problem
- How can my app crash when the problem code is between a try except
The page is being served up with
charset=ISO-8859-1. Decode from that to unicode.[![Snapshot of details from a browser. Credit @Old Panda]](https://i.stack.imgur.com/jVHTy.png)