I am using Java/Groovy to find matches(and extract them) on a string through RegEx. What is the best way of finding matches of 200 or more regex on a string of, lets say, 5000 characters in terms of performance. In a nutshell, is it possible to avoid scanning the string for each RegEx?
I can use the Pattern and Matcher classes provided by java but then I will have to compile 200 patterns and then pass the string to matcher 200 times. Is that the only way of doing it?
If your regexes do not have common matches you can always combine them in a gigantic one by using alternatives, e.g.
However given the complexity of your problem I think you should consider switching from regexes to a proper scanner/parser combination. It will take time upfront, but the resulting solution will be much more manageable. Why don’t you check out Antlr?