I’m just looking for ideas/suggestions here; I’m not asking for a full on solution (although if you have one, I’d be happy to look at it)
I’m trying to find a way to only upload changes to text. It’s most likely going to be used as a cloud-based application running on jQuery and HTML, with a PHP server running the back-end.
For example, if I have text like
asdfghjklasdfghjkl
And I change it to
asdfghjklXasdfghjkl
I don’t want to have to upload the whole thing (the text can get pretty big)
For example, something like 8,X sent to the server could signify:
add an X to the 8th position
Or D8,3 could signify:
go to position 8 and delete the previous 3 terms
However, if a single request is corrupted en route to the server, the whole document could be corrupted since the positions would be changed. A simple hash could detect corruption, but then how would one go about recovering from the corruption? The client will have all of the data, but the data is possibly very large, and it is unlikely to be possible to upload.
So thanks for reading through this. Here is a short summary of what needs suggestions
- Change/Modification Detection
- Method to communicate the changes
- Recovery from corruption
- Anything else that needs improvement
There is already an accepted form for transmitting this kind of “differences” information. It’s called Unified Diff.
The google-diff-match-patch provides implementations in Java, JavaScript, C++, C#, Lua and Python.
You should be able to just keep the “original text” and the “modified text” in variables on the client, then generate the diff in javascript (via diff-match-patch), send it to the server, along with a hash, and re-construct it (either using diff-match-patch or the unix “patch” program) on the server.
You might also want to consider including a “version” (or a modified date) when you send the original text to the client in the first place. Then include the same version (or date) in the “diff request” that the client sends up to the server. Verify the version on the server prior to applying the diff, so as to be sure that the server’s copy of the text has not diverged from the client’s copy while the modification was being made. (of course, in order for this to work, you’ll need to update the version number on the server every time the master copy is updated).