I am doing some fairly extensive string manipulations using regular expressions in Java. Currently, I have many blocks of code that look something like:
Matcher m = Pattern.compile("some pattern").matcher(text);
StringBuilder b = new StringBuilder();
int prevMatchIx = 0;
while (m.find()) {
b.append(text.substring(prevMatchIx, m.start()));
String matchingText = m.group(); //sometimes group(n)
//manipulate the matching text
b.append(matchingText);
prevMatchIx = m.end();
}
text = b.toString()+text.substring(prevMatchIx);
My question is which of the two alternatives is more efficient (primarily time, but space to some extent):
1) Keep many existing blocks as above (assuming there isn’t a better way to handle such blocks — I can’t use a simple replaceAll() because the groups must be operated on).
2) Consolidate the blocks into one big block. Use a "some pattern" that is the combination of all the old blocks’ patterns using the |/alternation operator. Then, use if/else if within the loop to handle each of the matching patterns.
Thank you for your help!
If the order in which the replacements are made matters, you would have to be careful when using technique #1. Allow me to give an example: If I want to format a String so it is suitable for inclusion in XML, I have to first replace all
&with&and then make the other replacements (like<to<). Using technique #2, you would not have to worry about this because you are making all the replacements in one pass.In terms of performance, I think #2 would be quicker because you would be doing less String concatenations. As always, you could implement both techniques and record their speed and memory consumption to find out for certain. 🙂