Here is my code. My parse_item method is not getting called. from scrapy.contrib.spiders import

Question

0

Asked: May 30, 20262026-05-30T14:56:35+00:00 2026-05-30T14:56:35+00:00

Here is my code. My parse_item method is not getting called. from scrapy.contrib.spiders import

0

Here is my code. My parse_item method is not getting called.

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector

class SjsuSpider(CrawlSpider):

    name = 'sjsu'
    allowed_domains = ['sjsu.edu']
    start_urls = ['http://cs.sjsu.edu/']
    # allow=() is used to match all links
    rules = [Rule(SgmlLinkExtractor(allow=()), follow=True),
             Rule(SgmlLinkExtractor(allow=()), callback='parse_item')]

    def parse_item(self, response):
        print "some message"
        open("sjsupages", 'a').write(response.body)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T14:56:37+00:00

Editorial Team

2026-05-30T14:56:37+00:00Added an answer on May 30, 2026 at 2:56 pm

Your allowed domain should be 'cs.sjsu.edu'.

Scrapy does not allow subdomains of an allowed domain.

Also, your rules could be written as:

rules = [Rule(SgmlLinkExtractor(), follow=True, callback='parse_item')]

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Here is my code. My parse_item method is not getting called. from scrapy.contrib.spiders import

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply