I have already decided on using the Horde Text_Diff engine in a LAMP stack for calculating diff’s and rendering them. My question is this:
What would be a good way of actually storing the incrementals in a database? I’ve never had to design this kind of database application before, and it appears that most engines want a fully serialized copy of the entire original and changed text in order to render the differences.
If that’s the case, then how can I store the data of the diff in a database without storing the entire new document?
(NOTE: For this particular purpose, it will always be current version->proposed diff->new current version, meaning that I’m trying to store an actual diff instead of a reverse diff.)
I think you should be able to work with the
patchutility. It creates the difference between two texts (or files) in form of the changes only. That created patch can then be stored inside the database. You still need the original text and then all patches up to the latest revision.For PHP the xdiff Extension can be used for creating diffs for text and files.
Storing DIFFs in the database
To store the diffs inside the database you need to preserve the order of diffs, the diffs contents and the original text.
I assume you are already storing the original text. The diffs then can be stored into a diffs table containing a reference to the original text and and auto-increment key to preserve the order next to the text-contents of the diffs. You then need to insert one diff after the other in the correct order and should be fine.
To recreate the current version, query the original version and all diffs ordered. Then apply one diff after the other to get the version you like to get.
Alternatively you can create another table that contains a specific revisions result as well so to prevent to run lot of cycles over and over again. But then this will make the data inside the database redundant.