I’ve been working on a text editor for some time. I made a custom edit control, from scratch, and I’ve got the basics down now. The problem I am facing is regarding line management. Since, my program relies on dividing the input text into lines(the text is printed line by line), line management is pretty important. I was using std::vector to store the line positions. I am using a Piece Table for my text processing, but for the sake of simplicity, let’s say that I have an array of characters. I add/insert an element into the line vector every time the user presses enter. The issue is that every time the user inserts a character, the whole structure is disturbed. For example :
0 1 2 3 4 5 6 7 8 9 10
text = ['h','e','l','l','o','\n','W','o','r','l','d']
state of line vector :
line[0] = 0
line[1] = 6
Let’s say the user inserts a character(‘x’) after the text[2]:
0 1 2 3 4 5 6 7 8 9 10 11
text = ['h','e','l','x','l','o','\n','W','o','r','l','d']
state of line vector :
line[0] = 0
line[1] = 6
Because of the insertion, I would need to update the value of each element in the lines vector after the current line. The same for deletion. If there are 1000 lines in a program and the user edits the first line, it would be quite inefficient to update all 999 elements(except the first one).
What I was thinking of was to keep each line independent of each other. But that would lead to complications when an existing line is divided into two lines. So I’d like to know what’s a good way to go about the problem?
Edit:
Just to clarify, I am using a data structure called Piece Table. I am not using an array of characters. Here is what a piece table data structure is :
http://www.cs.unm.edu/~crowley/papers/sds.pdf
The classic data structure used by many editors is the “Gap Buffer“. This basically has a working space that lives around the cursor where activity happens so that the local operations happen quickly. Then, when the cursor moves, the gap will, assuming a change happens, move with it.
As far as line calculations, the modern systems are fast enough where you can pretty much simply scan the buffer and look for lines. The nice thing is that you don’t need to do this on most operations, so you refrain from doing it all the time. Also, there’s a difference between physical lines in the buffer (i.e. collections of characters ending with an EOL marker) and soft lines (ala word wrap, etc.). Consider a modern word processor where paragraphs are routinely a single “line” but wrap to the page margins. Of course, you can handle this either way.
Finally, for most operations on the keyboard, you can simply use relative positions (i.e. if you insert a new line, then it’s straightforward to add a new line marker to a line array, since you already know the point you are at within the buffer). But when you do, say, a large paste operation of several lines, it’s likely faster to just cram it all in and recalculate the entire buffer (as an alternative, you could always break the paste up in to lines, and insert them one by one behind the scenes, just like a normal line).
For huge huge buffers, or slow slow computers, you may want to consider not worrying so much about the global state (exactly how many lines are in the buffer, exactly what line you might be on, etc.) at any one point and kick off that kind of recalculation in to the background. Most likely the pause will be minor (but annoying if you’re typing), and will catch up as soon as the human simply pauses to catch their thoughts. Clearly this can complicate the design and you’ll likely be ok using brute force on modern hardware for the time being.