I am trying to scrap a website using lxml and mechanize, and I got an error:
AttributeError: ‘NoneType’ object has no attribute ‘xpath’
After some check I found html returned None.
The funny part is, this code works on other websites, only failed to work this particular website (http://www.selangortimes.com)
url = 'http://www.selangortimes.com'
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_handle_refresh(False)
br.addheaders = [('User-Agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)')]
br.open(url)
resp = br.response()
html = lxml.html.parse(resp).getroot()
link_targets = [link.attrib.get('href') for link in html.xpath(expr)]
Appreciate your help 🙂
Update:
An example of a working website using the above code – http://www.themalaysianinsider.com
The following slightly revised version of the code you have posted, using lxml 2.3.6 and mechanize 0.2.5 produces a list of all the
hrefattributes in<a>elements at thehttp://www.selangortimes.comurl. Note concerning your latest comment that you have toimport lxml.html.