I am working scrapy framework below is my spider.py code
class Example(BaseSpider):
name = "example"
allowed_domains = {"http://www.example.com"}
start_urls = [
"http://www.example.com/servlet/av/search&SiteName=page1"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
hrefs = hxs.select('//table[@class="knxa"]/tr/td/a/@href').extract()
# href consists of all href tags and i am copying in to forwarding_hrefs by making them as a string
forwarding_hrefs = []
for i in hrefs:
forwarding_hrefs.append(i.encode('utf-8'))
return Request('http://www.example.com/servlet/av/search&SiteName=page2',
meta={'forwarding_hrefs': response.meta['forwarding_hrefs']},
callback=self.parseJob)
def parseJob(self, response):
print response,">>>>>>>>>>>"
Result:
2012-07-18 17:29:15+0530 [example] DEBUG: Crawled (200) <GET http://www.example.com/servlet/av/search&SiteName=page1> (referer: None)
2012-07-18 17:29:15+0530 [MemorialReqionalHospital] ERROR: Spider error processing <GET http://www.example.com/servlet/av/search&SiteName=page2>
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1167, in mainLoop
self.runUntilCurrent()
File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 789, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 361, in callback
self._startRunCallbacks(result)
File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 542, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/local/user/project/example/example/spiders/example_spider.py", line 36, in parse
meta={'forwarding_hrefs': response.meta['forwarding_hrefs']},
exceptions.KeyError: 'forwarding_hrefs'
What i am trying to do is i am collecting all the href tags from
http://www.example.com/servlet/av/search&SiteName=page1
and placing in to forward_hrefs and calling this forward_hrefs in the next request(want to use this forward_urls list in the next method)
http://www.example.com/servlet/av/search&SiteName=page2
I want to also add the href tags from page2 in the forward_urls and loop in this forward_hrefs and yield request of each href tag, this is my idea but it is showing error as above, whats wrong in the above code, actually meta tag is meant to copy the items.
Can anyone please let me know this how to copy forward_hrefs list from parse method to parseJob method.
Finally my intension is to copy forward_hrefs list from parse method to parseJob method.
hope i explained well sorry if not please let me know….
Thanks in advance
Haven’t tried anything but it seems you have an error here:
You are passing response.meta[‘forwarding_hrefs’] but it dosn’t exist for this response
You need to put:
cause you have forwarding_hrefs field and this way you’ll send it to parse job inside meta and then inside meta you’ll be able to access response.meta[‘forwarding_hrefs’] cause it will exist in that response object.