What are the pros and cons of using a dump-file as a basis of data and schema migration, as opposed to fully script based or a database delta tool?
The context is that the application is in production, and there is only one production database. The application and database schema are in active development. Critical user data exists in the production database and must be rolled forward with deployment of new versions or fixes.
The solutions being discussed are :
Dump file basis –
- Start with a reference point dump file.
- Database alter scripts are checked into source control.
- Deployment entails loading the dump file and then running the alter scripts
Schema + migration
- Entire schema and certain non-user configuration data are stored as DDLs and DMLs in SCM.
- Migrations scripts against the latest release’s schema are stored in SCM.
- Deployment entails loading the schema then migrating the data. 3.
My intuition is that using a binary format as the basis is bad, but I need to be able to convince others (if that is indeed the case), who argue that it is necessary.
I re-formulated this question to make it easier to answer.
Below is the original question:
I am working with a team on a database driven enterprise application
and looking to improve our process. Currently, there is a lot of
manual process around updating the database on all tiers. The goal is
having automated process for updating the database consistently and in
an automated fashion, (in line with the idea of atomic commits and
closer towards continuous delivery), which would bring numerous
advantages.I think that the schema (and certain data necessary for application
configuration) should be represented in source control, and in
addition, any scripts necessary to transform and load user data from
the current production database. I have read that it is not advisable
to have a dump file (.dmp) in source control, and intuitively, I agree
strongly. However I am not agreed with by everyone on the project.
The counter argument is that in reality it is not possible, or at
least is too difficult, to not start with a dump file. I am up
against my limit of database knowledge and can’t really debate
meaningfully… I am more a developer and not a database specialist.
The alternative suggested is that alter scripts be kept that alter the
dump to the latest schema.Can someone help me understand the pros and cons of each approach
little better? Is the dump-based approach necessary, a good idea, or
not a good idea, and why?A little background that may be relevant: the application is in
production so each new version must import data as a part of the
deployment process, and for obvious reasons on integration and UAT
tiers this should be real data. However this app is not “shipped” and
installed by customers, there is only one instance of production at a
given time, maintained internally. I realize there will be details
specific to my project so the answer will have to address the general
case.
A lot of bad stuff arises from having different scripts for fresh install and upgrade. I worked for the Oracle E-Business Suite in the early 2000’s and the adpatch tool in my experience eliminated that fatal variation.
A key technique I absolutely loathed, after Oracle acquired my employer, was insisting that all scripts be completely re-runnable without errors – and runnable with no errors at all. Once we got our patch quality up to snuff I realized it was totally genius.
Another key technique I learned was having strong database comparison/verification scripts.
If your schema is in good shape, your datasets will most easily look after themselves.