A legacy app program has a huge String Buffer (size sometimes upto an Mb) and it is processed sequentially for modifying the contents. I have to implement a change wherein I need to update the string buffer to remove some lines starting with certain specific words. What are the possible ways to implement this ?
Ex:
ABC:djfk kdjf kdsjfk#
ABC:jfue eijf iefe#
DEL:kdjfi efe eei #
DEL:ieeif dfddf dfdf#
HJU:heuir fwer ouier#
ABC:dsf erereree ererre #
I need to delete lines starting with DEL. Splitting the string buffer to string, processing the lines and again joining the strings to create a string buffer would be a bit costly. Pls let me know the possible solutions.
Thanks
It is possible to do this in-place efficiently. You’d have to overwrite the characters in the buffer at the proper intervals, and you’d then logically shorten the buffer with
setLength. It’s going to be quite complex, but it would be in-place andO(N).The reason why you’d want to overwrite instead of using
delete/insertis because that would beO(N^2)instead, because things need to be shifted around unnecessarily.Doing this out-of-place is quite trivial and
O(N)but would require a secondary buffer, doubling the space requirement.Proof-of-concept
Here’s a simple proof-of-concept.
removeIntervalstakes aStringBufferand anint[][] intervals. Eachint[]is assumed to be a pair of{ start, end }values (half-open range, exclusive upper bound). In linear time and in-place, these intervals are removed from theStringBufferby a simpleoverwrite. This works when intervals are sorted and non-overlapping, and processed left-to-right.The buffer is then shortened with
setLength, cutting off as many characters that were removed.Then we can test it as follows:
This prints (as seen on ideone.com):
Getting the intervals
In this specific case, the intervals can either be built in a preliminary pass (using
indexOf), or the whole process can be done in one pass if absolutely required. The point is that this can definitely be done in-place in linear time (and if absolutely necessary, in a single-pass).An out-of-place solution
This is out-of-place using a secondary buffer and regex. It’s offered for consideration due to its simplicity. Unless further optimization is provably required (after evidentiary profiling results), this should be good enough:
This prints (as seen on ideone.com):
References
java.util.regex.Pattern