I have written a scrapy spider to scrape out some html tags. Now the problem is that this spider works perfectly for a url that is running on internet but not for a url that is on localhost. What i mean is, the spider produces error for a url of the resource on local computer even when the url is perfectly correct and works correctly for the same resource when url for the running site.
Can someone clear this doubt of mine?
def parse(self, response):
hxs = HtmlXPathSelector(response)
con = MySQLdb.connect(host="localhost",
user = "username",
passwd="psswd",
db ="dbname")
cur = con.cursor()
title = hxs.select("//h3")[0].extract()
desc = hxs.select("//h2").extract()
a = hxs.select("//meta").extract()
cur.execute("""Insert into heads(h2) Values(%s )""",(a))
con.commit()
con.close()
The error
on this line
indicates that the list
hxs.select("//h3")is empty ([]) since attempting to access the first item (index 0) withhxs.select("//h3")[0]uses an index which Python tells us is out of range.The html you are parsing apparently has no
<h3>tags.Also, after you fix the above error, you’ll need to put a comma after the
ain(a,):(a)is evaluated toa, whereas(a,)represents a tuple with 1 element inside.