For example, I want to classify c*t => CLASS1, and d*g => CLASS2:
Pattern CXT = Pattern.compile("^c.*t$");
Pattern DXG = Pattern.compile("^d.*g$");
public int classify(String in) {
if (CXT.matches(in)) return CLASS1;
if (DXG.matches(in)) return CLASS2;
return -1;
}
It’s very inefficient if there are a lot of mode patterns.
Assume all patterns are orthogonal, it’s easy to see a single pass in one DFA is enough. So, is there exist such regex processor which could combine all patterns together?
You should take a look at dk.brics.automaton package, which is not exactly what you’re looking for, but it’s a really fast state machine implementation with BSD license.
So you can build up your automaton which does the classification for you faster than a regular expression.