I am trying to scrap a website using lxml and mechanize, and I got

Question

0

Editorial Team

Asked: June 12, 20262026-06-12T14:11:21+00:00 2026-06-12T14:11:21+00:00

I am trying to scrap a website using lxml and mechanize, and I got

0

I am trying to scrap a website using lxml and mechanize, and I got an error:

AttributeError: ‘NoneType’ object has no attribute ‘xpath’

After some check I found html returned None.

The funny part is, this code works on other websites, only failed to work this particular website (http://www.selangortimes.com)

url = 'http://www.selangortimes.com'
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_handle_refresh(False)
br.addheaders = [('User-Agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)')]
br.open(url)
resp = br.response()
html = lxml.html.parse(resp).getroot()
link_targets = [link.attrib.get('href') for link in html.xpath(expr)]

Appreciate your help 🙂

Update:
An example of a working website using the above code – http://www.themalaysianinsider.com

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T14:11:23+00:00

The following slightly revised version of the code you have posted, using lxml 2.3.6 and mechanize 0.2.5 produces a list of all the href attributes in <a> elements at the http://www.selangortimes.com url. Note concerning your latest comment that you have to import lxml.html.

import mechanize
import lxml.html

url = 'http://www.selangortimes.com'
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_handle_refresh(False)
br.addheaders = [('User-Agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)')]
br.open(url)
resp = br.response()
html = lxml.html.parse(resp).getroot()
link_targets = [link.attrib.get('href') for link in html.xpath('//a')]
print(link_targets)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to scrap a website using lxml and mechanize, and I got

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply