EDIT: I have totally rewritten this question for clarity. I got no comments and no answers earlier.
I am maintaining a 2.x Rails app with plenty of statistical data. Some data is real and some is estimated for the future years. Every year I need to update estimated data with real data and calculate new estimates.
I have been using BIG yml-files and migrations for loading the data into the app every year. My migrations are full of estimation calculations and data corrections.
Problem
My migrations are full of none-schema related material and I can’t even dream of doing db:migrate:reset without waiting few hours (if it even works). I’d love to see my migrations nice and clean – only with schema related modifications. But how I am suppose to update the data every year if not using migrations?
Help needed
I’d like to hear your comments and answers. I’m not looking for a silver bullet – more like best practises and ideas how people are handling similar situation.
It sounds like you have a large operation (data load using yml files) once a year but smaller operations once a month.
From my experience with statistical data you will probably end up doing more and more of these operations to clean and add more data.
I would use a job processing framework like resque and resque scheduler.
You can schedule the jobs to run once a month, year, day or constantly running. A job is something like loading yml files (or sets of yml files) or cleaning up data. You can control parameters to send to your job so you can use one class but alternate how it updates or cleans your data based on the way you enqueue or schedule the job.