I have a website that requires using Nokogiri on many different websites to extract data. This process is ran as a background job using the delayed_job gem. However it takes around 3-4 seconds per page to run because it has to pause and wait for other websites to respond.
I am currently just running them by basically saying
Websites.all.each do |website|
# screen scrape
end
I would like to execute them in batches rather than one each so that I dont have to wait for a server response from every site (can take up to 20 seconds on occassion).
What would be the best ruby or rails way to do this?
Thanks for your help in advance.
You need to use delayed job. Check out this Railscasts.
Keep in mind most hosts charge for this type of thing.
You can also use the spawn plugin if you don’t care about managing threads but it is much much easier!!!
This is literally all you need to do:
rails plugin/install https://github.com/tra/spawn.gitFor example:
http://railscasts.com/episodes/171-delayed-job
https://github.com/tra/spawn