Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7648523
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T10:44:14+00:00 2026-05-31T10:44:14+00:00

We have a system written with scrapy to crawl a few websites. There are

  • 0

We have a system written with scrapy to crawl a few websites. There are several spiders, and a few cascaded pipelines for all items passed by all crawlers.
One of the pipeline components queries the google servers for geocoding addresses.
Google imposes a limit of 2500 requests per day per IP address, and threatens to ban an IP address if it continues querying google even after google has responded with a warning message: ‘OVER_QUERY_LIMIT’.

Hence I want to know about any mechanism which I can invoke from within the pipeline that will completely and immediately stop all further crawling/processing of all spiders and also the main engine.

I have checked other similar questions and their answers have not worked:

  • Force my scrapy spider to stop crawling
from scrapy.project import crawler
crawler._signal_shutdown(9,0) #Run this if the cnxn fails.

this does not work as it takes time for the spider to stop execution and hence many more requests are made to google (which could potentially ban my IP address)

import sys
sys.exit("SHUT DOWN EVERYTHING!")

this one doesn’t work at all; items keep getting generated and passed to the pipeline, although the log vomits sys.exit() -> exceptions.SystemExit raised (to no effect)

  • How can I make scrapy crawl break and exit when encountering the first exception?
crawler.engine.close_spider(self, 'log message')

this one has the same problem as the first case mentioned above.

I tried:

scrapy.project.crawler.engine.stop()

To no avail

EDIT:
If I do in the pipeline:

from scrapy.contrib.closespider import CloseSpider

what should I pass as the ‘crawler’ argument to the CloseSpider’s init() from the scope of my pipeline?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T10:44:15+00:00Added an answer on May 31, 2026 at 10:44 am

    You can raise a CloseSpider exception to close down a spider.
    However, I don’t think this will work from a pipeline.

    EDIT: avaleske notes in the comments to this answer that he was able to raise a CloseSpider exception from a pipeline. Most wise would be to use this.

    A similar situation has been described on the Scrapy Users group, in this thread.

    I quote:

    To close an spider for any part of your code you should use
    engine.close_spider method. See this extension for an usage
    example:
    https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/closespider.py#L61

    You could write your own extension, whilst looking at closespider.py as an example, which will shut down a spider if a certain condition has been met.

    Another “hack” would be to set a flag on the spider in the pipeline. For example:

    pipeline:

    def process_item(self, item, spider):
        if some_flag:
            spider.close_down = True
    

    spider:

    def parse(self, response):
        if self.close_down:
            raise CloseSpider(reason='API usage exceeded')
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a tile system written in XNA, and there is problems with tiles
I have been attempting to enhance my GUI system written in Java to use
We have an automatic reporting and notification system written in .net that sends emails
I have written a small utility to collect system data and output it to
I have written the following IronPython code: import clr clr.AddReference(System.Drawing) from System import *
I have a three tier system, SQL Server backend, hand written data access layer,
I currently have a small text game I've written in Java that utilizes System.out.print();
Let's say we have system A comprising a MySQL database, with several tables. After
I have a comment system written in PHP and JavaScript but each time I
I have a large inhouse system written in a combination of C# and PHP.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.