I am using Scrapy to crawl through a website. The links I need to crawl are of the form http://www.somesite.com/details.html?pageId=<some_integer_id>. The value of some_integer_id extends from 1 to 100 (not 100 exactly). What I do is this:
1.I create a function to generate a list of urls:
def generateURLs(self):
url_list = []
for i in range(1, 101):
url_list.append('http://www.somesite.com/details.html?pageId=%d' % i)
return url_list
2.Use this function to set the value of start_urls of Scrapy like this:
def __init__(self):
self.start_urls = self.generateURLs()
Is this the recommended way to use Scrapy or is there any other better way to do this when I just need to change a value of a request parameter?
Thanks.
This method sounds fine, there is no “golden” method.
However, considering Scrapy calls start_requests you could opt for an override of start_requests like:
Effect is the same, with less code.