I am working scrapy framework below is my spider.py code class Example(BaseSpider): name =

Question

0

Asked: June 8, 20262026-06-08T00:36:22+00:00 2026-06-08T00:36:22+00:00

I am working scrapy framework below is my spider.py code class Example(BaseSpider): name =

0

I am working scrapy framework below is my spider.py code

class Example(BaseSpider):
    name = "example"
    allowed_domains = {"http://www.example.com"}


start_urls = [
    "http://www.example.com/servlet/av/search&SiteName=page1"

]

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    hrefs = hxs.select('//table[@class="knxa"]/tr/td/a/@href').extract()
    # href consists of all href tags and i am copying in to forwarding_hrefs by making them as a string 
    forwarding_hrefs = []
    for i in hrefs:
        forwarding_hrefs.append(i.encode('utf-8'))
    return Request('http://www.example.com/servlet/av/search&SiteName=page2',
                    meta={'forwarding_hrefs': response.meta['forwarding_hrefs']},
                   callback=self.parseJob)    


def parseJob(self, response):
    print response,">>>>>>>>>>>"

Result:

2012-07-18 17:29:15+0530 [example] DEBUG: Crawled (200) <GET http://www.example.com/servlet/av/search&SiteName=page1> (referer: None)
2012-07-18 17:29:15+0530 [MemorialReqionalHospital] ERROR: Spider error processing <GET http://www.example.com/servlet/av/search&SiteName=page2>
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1167, in mainLoop
        self.runUntilCurrent()
      File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 789, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 361, in callback
        self._startRunCallbacks(result)
      File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks
        self._runCallbacks()
    --- <exception caught here> ---
      File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 542, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
      File "/home/local/user/project/example/example/spiders/example_spider.py", line 36, in parse
        meta={'forwarding_hrefs': response.meta['forwarding_hrefs']},
    exceptions.KeyError: 'forwarding_hrefs'

What i am trying to do is i am collecting all the href tags from

http://www.example.com/servlet/av/search&SiteName=page1

and placing in to forward_hrefs and calling this forward_hrefs in the next request(want to use this forward_urls list in the next method)

http://www.example.com/servlet/av/search&SiteName=page2

I want to also add the href tags from page2 in the forward_urls and loop in this forward_hrefs and yield request of each href tag, this is my idea but it is showing error as above, whats wrong in the above code, actually meta tag is meant to copy the items.
Can anyone please let me know this how to copy forward_hrefs list from parse method to parseJob method.

Finally my intension is to copy forward_hrefs list from parse method to parseJob method.

hope i explained well sorry if not please let me know….

Thanks in advance

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T00:36:24+00:00

Haven’t tried anything but it seems you have an error here:

 return Request('http://www.example.com/servlet/av/search&SiteName=page2',
                meta={'forwarding_hrefs': response.meta['forwarding_hrefs']},
                callback=self.parseJob)

You are passing response.meta[‘forwarding_hrefs’] but it dosn’t exist for this response

You need to put:

 return Request('http://www.example.com/servlet/av/search&SiteName=page2',
                meta={'forwarding_hrefs': forwarding_hrefs},
                callback=self.parseJob)

cause you have forwarding_hrefs field and this way you’ll send it to parse job inside meta and then inside meta you’ll be able to access response.meta[‘forwarding_hrefs’] cause it will exist in that response object.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am working scrapy framework below is my spider.py code class Example(BaseSpider): name =

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply