I’m trying to export about 300k records to XLS using a DelayedJob in heroku with the spreadsheet gem (http://spreadsheet.rubyforge.org/). Unfortunately I need to iterate over all records, because some fields are extracted from other related tables.
Exporting to CSV works fine (although it takes a lot of time), and it would be possible to write each record to S3 directly, as I go through them.
The problem is that using the spreadsheet gem, I can’t seem to efficiently export a XLS report of these records, because the process will consume a lot of memory really fast.
So the question is:
How would you export a lot of data from the database to a XLS file that will be hosted on S3, considering that 1. you cannot write to the filesystem on heroku and 2. you should not exceed the memory quota of 512MB?
On cedar you can write to the filesystem, though it is still ephemeral and will go away at least once a day, and on code pushes, restartes, etc.
I am unfamiliar with the particular gem you mention, however the best approach would be to do work in batches of records, so that the garbage collector can free things as you go.
If you can figure out some better, custom sql, you can have postgres do the work of pulling in the fields form the related tables.
Additionally you should look into cursors to avoid loading the entire dataset: http://www.postgresql.org/docs/8.3/static/plpgsql-cursors.html