I have a Ruby script that crawls a site, and takes 40 minutes. What I would like to do is dump the results into a database and be able to do regular ActiveRecord stuff on the data.
- If I put it in a Rails app and have a ‘start’ button that initiates the script, will it time out?
- Ideally I want the script to run at least once a day – and update the db – so would my best bet be to create a rake task, or is there some other way to do this?
- If I am wrapping this in a Rails App, in which folder should I put the script and what is the best way to approach it? I can’t drop it in a model file – because that doesn’t make any sense.
- I have never done any ‘Job’ type processing before, but this sounds like it may fall in that purview. What other things should I consider when doing this?
Edit 1:
Another question is, if I put this particular Ruby script in my /lib directory, how do I get it to interact with the DB? I usually interact with the DB from the model and controller. How would I store the results in my DB after it is run?
I’ll try to give you some straightforward answers.
A) You would most likely put it as a background job. There are some decent gems for that. Consider https://github.com/defunkt/resque or something more lightweight such as https://github.com/tobi/delayed_job.
B) A rake task would suffice and then run it with either of the mentioned libraries. Another option would be a cron job.
C) You should put it in the lib/ directory.
D) You should always have some kind of processing log available in order to track potential errors. Be sure to read the instructions properly if you choose any of the two libraries mentioned above.