Suppose this is my code from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from

Question

0

Asked: June 16, 20262026-06-16T00:36:22+00:00 2026-06-16T00:36:22+00:00

Suppose this is my code from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from

0

Suppose this is my code

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

from dmoz.items import DmozItem

class DmozSpider(BaseSpider):
   domain_name = "dmoz.org"
   start_urls = [
       "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
       "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
   ]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       sites = hxs.select('//ul[2]/li')
       items = []
       for site in sites:
           item = DmozItem()
           item['title'] = site.select('a/text()').extract()
           item['link'] = site.select('a/@href').extract()
           item['desc'] = site.select('text()').extract()
           items.append(item)
       return items

SPIDER = DmozSpider()

If i have used crawlSpider then i could uses Rules to implement thelink extractor but how can i mention rules in base spider. Like in above example. Because rules is only avaialble in crawlspider not base spider

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T00:36:23+00:00

Perhaps you could parse the response for your rule criteria and then pass the successful responses on to a second callback? Pseudo-code below:

def parse(self, response):
    # check response for rule criteria
    ...
    if rule: 
        # create new request to pass to second callback
        req = Request("http://www.example.com/follow", callback=self.parse2)
        return req

def parse2(self, response):
    hxs = HtmlXPathSelector(response)
    # do stuff with the successful response

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Suppose this is my code from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply