Let’s say I have an arbitrary list of regexes (IList<Regex> lst; for example). Is there any way to find out which one matches earlier in the string?
Of course there is the straightforward solution of trying each one on the string and seeing which match has the lowest index, but this could be inefficient on long strings.
Of course I can go back and pull the strings back out of each regex (Regex.ToString()) and concatenate them all together ("(regex1)|(regex2)|(regex3)"), but I find this to be an ugly solution, especially since it does not even indicate which regex was matched.
EDIT: Basically, is there a way to combine the already-compiled regexes without string manipulation and recompilation?
It’s pretty well known that executing one expression with multiple groups is usually slower than executing each expression in turn. It might look like creating one expression would be faster, but actually the Regex Engine will first search the whole string to find the first expression, maybe going all the way to the end of the string, but when it finds a match it will return. So there is no way to force it to return the first Match. This is due to the way the .NET Regex engine works.
Since each regex might start earlier in the string, but can potentially result in a longer match, you can’t limit the end of your search to the index of the currently earliest match like this:
This will work when you know the maximum length each regex will match, in that case use:
But you can shortcircuit as soon as you find a match on the first position, saving you any potential executions after that.
With a little trickery you can use the index of your current first match candidate to limit the search, but I suspect it will actually still be slower than searching all possible matches:
In the end, experiment and measure to find the way that works best for you.
In that case you can opt to first scan using a reasonable value for the length index, so use
rx.Match(targetstring, 0, 1024 /* First scan */)and only if you don’t find a match, widen your search in a second pass. If your target string can be really large this will save a lot of compute power.