I need to periodically import some data into my rails app on Heroku.
The task to execute is split into the following parts:
* download a big zip file (e.g. ~100mb) from a website
* unzip the file (unzipped space is ~1.50gb)
* run a rake script that reads those file and create or update records using my active record models
* cleanup
How can I do this on heroku? Is it better to use some external storage (e.g. S3).
How would you approach such a thing?
Ideally this needs to run every night.
I have tried exact same thing couple of days back and the conclusion that I came up with was this can’t be done because of memory limit restrictions that heroku imposes on each process. (I build a data structure with the files that I read from the internet and try to push to DB)
I was using a rake task that would pull and parse couple of big file and then populate the database.
As a work around I run this rake task in my local machine now and push the database to S3 and issue a heroku command from my local machine to restore the heroku DB instance.
You could push to S3 using fog library
The command that I use to make a pgbackup on my local machine is
I have put a rake task that automates all these steps.
After thing your might try is use worker(DelayedJob). I guess you can configure your workers to run every 24 hours. I think workers don’t have the restriction of 30 seconds limit. But I am not sure about the memory usage.