I’m writing a program in .net where the user may provide a large number of regular expressions. For a given string, I need to figure out which regular expression matches that string (if more than one matches, I just need the first one that matches). However, if there are a large number of regular expressions this operation can take a very long time.
I was somewhat hoping there would be something similar to flex (The Fast Lexical Analyzer (not Adobe Flex)) for .net that would allow me to specify a large number of regular expressions yet quickly (O(n) according to Wikipedia for n = len(input string)) figure out which regular expression matches.
Also, I would prefer not to implement my own regular expression engine :).
Find the biggest chunk of constant text in each regex (if above a certain threshold length) and use the Karp-Rabin algorithm to search for any of those strings simultaneously. For each match, run that regex to see if the whole thing matches. For each regex not included in the multi string search, search that regex directly.
This should give you good performance for a large number of regular expressions if they have reasonable-length constant substrings, assuming you have preprocessing time available for the regular expressions.