I have a two part question. First, I’m writing a web-scraper based on the

Question

0

Asked: May 27, 20262026-05-27T13:46:08+00:00 2026-05-27T13:46:08+00:00

I have a two part question. First, I’m writing a web-scraper based on the

0

I have a two part question.

First, I’m writing a web-scraper based on the CrawlSpider spider in Scrapy. I’m aiming to scrape a website that has many thousands (possible into the hundreds of thousands) of records. These records are buried 2-3 layers down from the start page. So basically I have the spider start on a certain page, crawl until it finds a specific type of record, and then parse the html. What I’m wondering is what methods exist to prevent my spider from overloading the site? Is there possibly a way to do thing’s incrementally or put a pause in between different requests?

Second, and related, is there a method with Scrapy to test a crawler without placing undue stress on a site? I know you can kill the program while it runs, but is there a way to make the script stop after hitting something like the first page that has the information I want to scrape?

Any advice or resources would be greatly appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T13:46:08+00:00

Is there possibly a way to do thing’s incrementally

I’m using Scrapy caching ability to scrape site incrementaly

HTTPCACHE_ENABLED = True

Or you can use new 0.14 feature Jobs: pausing and resuming crawls

or put a pause in between different requests?

check this settings:

DOWNLOAD_DELAY    
RANDOMIZE_DOWNLOAD_DELAY

is there a method with Scrapy to test a crawler without placing undue stress on a site?

You can try and debug your code in Scrapy shell

I know you can kill the program while it runs, but is there a way to make the script stop after hitting something like the first page that has the information I want to scrape?

Also, you can call scrapy.shell.inspect_response at any time in your spider.

Any advice or resources would be greatly appreciated.

Scrapy documentation is the best resource.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a two part question. First, I’m writing a web-scraper based on the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply