The application’s code and configuration files are maintained in a code repository. But sometimes, as a part of the project, I also have a some data (which in some cases can be >100MB, >1GB or so), which is stored in a database. Git does a nice job in handling the code and its changes, but how can the development team easily share the data?
It doesn’t really fit in the code version control system, as it is mostly large binary files, and would make pulling updates a nightmare. But it does have to be synchronised with the repository, because some code revisions change the schema (ie migrations).
How do you handle such situations?
We have the data and schema stored in xml and use liquibase to handle the updates to both the schema and the data. The advantage here is that you can diff the files to see what’s going on, it plays nicely with any VCS and you can automate it.
Due to the size of your database this would mean a sizable “version 0” file. But, using the migration strategy, after that the updates should be manageable as they would only be deltas. You might be able to convert your existing migrations one-to-one to liquibase as well which might be nicer than a big-bang approach.
You can also leverage @belisarius’ strategy if your deltas are very large so each developer doesn’t have to apply the delta individually.