I am writing a program that has to strip pretty long strings of quite a lot of rubbish. I do that using regular expressions, and as my program is rather sensitive in terms of speed, I need to know which of the solutions is faster: Using a number of consecutive relatively simple regular expressions, or using a single but quite a complex one?
Best regards,
Timofey.
You need to benchmark this stuff to be sure, and be sure to blog your results. I suspect one big regex will be quicker than many small ones, but I’m curious to see what you find out.
The
java.util.regex.Patternclass is pretty complex and I don’t pretend to know what optimizations it performs. I do know regexes compile into a graph, so an obvious one would be to combine overlapping paths. The more variations you stuff into a single expression, the more such opportunities arise. It may also reduce the number of passes over the input data.