I need to create a spider that crawls for some data from web site.

Question

Asked: June 8, 20262026-06-08T02:08:21+00:00 2026-06-08T02:08:21+00:00

I need to create a spider that crawls for some data from web site.
part of the data is an external URL.

I already created the spider that crawls the data from the root site and now i want to write the spider for external web pages.

I was thinking of creating a crawlspider that uses the SgmlLinkExtractor to follow some specific links in each external web page.

what is the recommended way to communicate the list of start_url to the second spider?

My idea is to generate a json file for the items and to read the attribute in start_requests of the second spider.

You must login to add an answer.

Need An Account,

Editorial Team · Answer 1 · 2026-06-08T02:08:23+00:00

Editorial Team

I already created the spider that crawls the data from the root site
and now i want to write the spider for external web pages.

Save these external page urls to a db.

what is the recommended way to communicate the list of start_url to the second spider?

Override BaseSpider.start_requests in your other spider and create requests from urls you get from the db.

The Archive Base Latest Questions