My entire database occasionally has entries which are wrong, but instead of altering the data directly I’d like the ability to keep a revision of changes.
These changes occur very rarely.
Ideally something like this: –
(original table fields) | revision_version | origin | user | timestamp
So say I had a table called posts with the following schema: –
title | description | timestamp | author
An additional table called posts_revisions would be created thusly: –
title | description | timestamp | author | revision_version | origin | user | timestamp
- origin being the source of the change, be it a bot, user generated or what have you.
As you can imagine this is a rather large change to the existing database, my current concern is the performance hit of checking the _revisions tables for every query. Is this best practice for this sort of thing?
For this type of problem, I keep a current table and a history table.
The history table has the following additional columns:
The effective and end dates are the time span where the values are valid. The version is just incremented every time there is a change for a record. The id, CreatedAt, and CreatedBy are columns I put into almost every table in the database.
Generally, I keep the history table up-to-date with nightly jobs, that compare the tables and then use MERGE to combine the data. An alternative is to wrap all changes in stored procedures, and to update both tables there. Another alternative is to use triggers, that detect when a change occurs. However, I shy away from triggers, preferring the first two alternatives.
I must admit that disk space is not a big consideration for these tables. So, there is no problem storing the data twice, once in the results once in the history. It would be just a minor tweak to store only history in the history table, with the current records in the “current” table.
One downside to this approach is changing the structure of the base table. If you want to add a column, you need to add it to the history table as well as the base table.