I have to scrape 1000 links that are simmilar in structure but only differ in contents.
I designed this spider, but I don’t want to put each url in start_urls, run it and repeat 1000 times, I have them all in a file, so how can I repeat the process in a way I send the start_url as parameter and do that with a for 1000 times…
Create a spider which overrides the BaseSpider’s init method. Within it, parse the file and append them to the start_urls list.
The code will look something like this:
Obviously, the way in which you loop through the file will depend on the type of file.
Also, you might look into using the items pipeline and utilizing a mysqldb pipeline to save data after parsing it.
EDIT
I will rewrite your spider for you. Technically, it is best practice to use a pipeline for some of what you are doing, but, for the sake of time, I will make your current spider work. One moment.
Try This
I didn’t change anything other than modifying the init and obtenerId methods.