I have a text like this:
My name is Bob and I live in America.
I have some reference to the characters of this string, for example:
from 3 to 7 chars, deleted
at 3 char, added "surname"
from 20 to 25, deleted
at 25 char ....
but these statements aren’t ordered (and I can’t order them).
So, this is the question: how can I modify the text without losing the reference of the characters?
For example, if I apply the first sentence, my text became:
My is Bob and I live in America.
and my third sentence doesn’t work correctly anymore, cause I’ve lost the reference to the original 20th character.
Keep in mind that the text is pretty long, so I can’t use any indexes…
First off, if this statement is true, the situation is hopeless:
An unordered list of patch statements could lead to a conflict. It will not be possible to decide what the right answer is in an automated fashion. For instance, consider the following situation:
You will wind up with different results depending on what order you execute these statements.
For instance, if you apply (1), then (2), then (3), you wind up with “apple banaconuts” –> “apple banaxyzconuts” –> “apple uts”.
But if you apply (3), then (2), then (1), you wind up with “apple onuts” –> “apple onutsxyz” –> [error — there aren’t enough characters to delete!].
Either you need a repeatable, agreed-upon ordering of the statements, or you cannot proceed any further. Even worse, it turns out that discovering which orderings are valid (for example, eliminating all orderings where an impossible statement occurs, like “delete 10 characters from index 20”, when there is no index 20) is an undecidable computer science problem.
If it turns out that the patches can be applied in a specific order (or at least in a repeatable, agreed-upon, deterministic order), the situation improves but is still obnoxious. Because the indices in any “patch” could be invalidated by any previous one, it’s not going to be possible to straightforwardly apply each statement. Instead, you’ll have to implement a small, pseudo-
diff. Here’s how I’d do it:As you perform operations, keep a reference to the original string and store a “dirty pointer”. This is the latest contiguous index in the string which has had no operations performed on it. Any operation you perform whose index exceeds the dirty pointer must first be pre-processed.
If you encounter a clean operation, one whose index is less than or equal to the dirty pointer, you can apply it with no further work. The dirty pointer now moves to that operation’s index.
If you encounter a dirty operation, one whose index is greater than the dirty pointer, you’ll have to do some work before you can apply it. Determine the real index of where the operation should be applied by looking at the previous operations, then make the appropriate offset and apply it.
Execute each operation in turn until there are no more operations to execute.
The result is your transformed string.