Let’s say I have a crawl spider similar to this example: from scrapy.contrib.spiders import

Question

0

Asked: May 21, 20262026-05-21T02:03:04+00:00 2026-05-21T02:03:04+00:00

Let’s say I have a crawl spider similar to this example: from scrapy.contrib.spiders import

0

Let’s say I have a crawl spider similar to this example:
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item

class MySpider(CrawlSpider):
    name = 'example.com'
    allowed_domains = ['example.com']
    start_urls = ['http://www.example.com']

    rules = (
        # Extract links matching 'category.php' (but not matching 'subsection.php')
        # and follow links from them (since no callback means follow=True by default).
        Rule(SgmlLinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))),

        # Extract links matching 'item.php' and parse them with the spider's method parse_item
        Rule(SgmlLinkExtractor(allow=('item\.php', )), callback='parse_item'),
    )

    def parse_item(self, response):
        self.log('Hi, this is an item page! %s' % response.url)

        hxs = HtmlXPathSelector(response)
        item = Item()
        item['id'] = hxs.select('//td[@id="item_id"]/text()').re(r'ID: (\d+)')
        item['name'] = hxs.select('//td[@id="item_name"]/text()').extract()
        item['description'] = hxs.select('//td[@id="item_description"]/text()').extract()
        return item

Let’s say I wanted to get some information like the sum of the IDs from each of the pages, or the average number of characters in the description across all of the parsed pages. How would I do it?

Also, how could I get averages for a particular category?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T02:03:05+00:00

Editorial Team

2026-05-21T02:03:05+00:00Added an answer on May 21, 2026 at 2:03 am

You could use Scrapy’s stats collector to build this kind of information or gather the necessary data to do so as you go. For per-category stats, you could use a per-category stats key.

For a quick dump of all stats gathered during a crawl, you can add STATS_DUMP = True to your settings.py.

Redis (via redis-py) is also a great option for stats collection.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Let’s say I have a crawl spider similar to this example: from scrapy.contrib.spiders import

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply