Please forgive my long question. I have an idea for a design that I could use some comments on. Is it a good idea to do this? And what are the pit falls I should be aware of? Are there other similar implementations that are better?
My situation is as follows:
I am working on a rewrite of a windows forms application that connects to a SQL 2008 (earlier it was SQL 2005) server. The application is an ‘expert-system’ for an engineering company where we store structured data about constructions. We have control of all installations of the client software, we have no external customers or users, they are all internal to the company, and they are all be trusted not to do anything malicious to the software or database.
The current design doesn’t have too many tables (about 10 – 20) but some of them have millions of records that belong to several hundred constructions. The systems performance has been ok so far, but it is starting to degrade as we are pushing the limits of the design.
As part of the rewrite I am considering splitting the database into one master database and several ‘child’ databases where each describes one construction. Each child database should be of identical design. This should eliminate the performance problems we are seeing today since the data stored in each database would be less than one percent of the total data amount.
My concern is that instead of maintaining one database we will now get hundreds of databases that must be kept up to date. The system is constantly evolving as the companys requirements change (you know how it is), and while we try to look forward to reduce the number of changes the changes will come. So we will need a system where we keep track of all database changes done to the system so they can be applied to the child databases. Updating the client application won’t be a problem, we have good control of that aspect.
I am thinking of a change tracing system where we store database scripts for all changes in a table in the master database. We can then give each change a version number and we can store a current version number in each child database. When the client program connects to a child database we can then check the version number of the database against the current version number of the master database and if there are patches with version numbers greater than the version number of the child database we run these and update the child database to the latest version.
As I see it this should work well. Any changes to the system will first be tested and validated before committed as a new version of the database. The change will then be applied to the database the first time a user opens it. I suppose we would open the database in exclusive mode while applying the changes, but as long as the changes aren’t too frequent this should not be a problem.
So what do you think? Will this work? Have any of you done something similar? Should we scrap the solution and go for the monolithic system instead?
I have a similar situation here, though I use MySQL. Every database has a versions table that contains the version (simply an integer) and a short comment of what has changed in this version. I use a script to update the databases. Every database change can be in one function or sometimes one change is made by multiple functions. Functions contain the version number in the function name. The script looks up the highest version number in a database and applies only the functions that have a higher version number in order.
This makes it easy to update databases (just add new change functions) and allows me to quickly upgrade a recovered database if necessary (just run the script again).
Even when testing the changes before this allows for defensive changes. If you make some heavy changes on a table and you want to play it safe:
if in change104() is something going wrong (and throws an exception) you can simply delete the already converted data from the new table, fix your change function and run the script again.
But I don’t think that changing a database dynamically when a client connects is a good idea. Sometimes changes can take some time. And the software that accesses a database should match the schema of the database. You have somehow to keep them in sync. Maybe you could distribute a new software version and then you want to upgrade the database when a client is actually starting to use this new software. But I haven’t tried that.