Just trying out scrapy and trying to get a basic spider working. I know this is just probably something I’m missing but I’ve tried everything I can think of.
The error I get is:
line 11, in JustASpider
sites = hxs.select('//title/text()')
NameError: name 'hxs' is not defined
My code is very basic at the moment, but I still can’t seem to find where I’m going wrong. Thanks for any help!
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class JustASpider(BaseSpider):
name = "google.com"
start_urls = ["http://www.google.com/search?hl=en&q=search"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//title/text()')
for site in sites:
print site.extract()
SPIDER = JustASpider()
I removed the SPIDER call at the end and removed the for loop. There was only one title tag (as one would expect) and it seems that was throwing off the loop. The code I have working is as follows: