I saw this post to make scrapy crawl any site without allowed domains restriction.
Is there any better way of doing it, such as using a regular expression in allowed domains variable, like-
allowed_domains = ["*"]
I hope there is some other way than hacking into scrapy framework to do this.
Don’t set allowed_domains at all.
Look at the get_host_regex() function in this scrapy file:
https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/spidermiddleware/offsite.py