I have written a scraper using PHP/cURL which works great but bottlenecks at cURL. AFAIK there is no way to improve the speed of cURL but I have read of other language/libraries which have faster speeds. Anyone have experience in this area, what % improvement could I expect? Probably not worth the trouble for anything less than 25%.
An alternative might be parallel cron jobs?
Curl on php is very fast. You should look in to using curl_multi to run your requests in parallel.
If you run your scraper through fiddler, you will see 99% of the time is waiting for the remote request.
You will need to experiment to see how many parallel requests give you the best performance. It will be different from site to site. Sometimes sites will be slower with multiple threads of execution if they are poorly written (no indexes on the db slow server etc).
I’ve written a web scraping framework that does a lot of this for you. Take a look, steal teh codez; learn some new techniques.