I have a number of regex patterns. When a string is input I have to find all the patterns match this string. This is normally an O(n) operation:
SELECT regex FROM regexes WHERE 'string' RLIKE regex
What is the fastest way to do this? Are there database structures or storage systems that are optimized to do such an operation?
The short answer is ‘No.’ There is no index structure currently available on any DBMS platform that will index partial matches of a regex like this.
The long answer is that a leading constant on a wildcard match (e.g.
'foo_') can be used as a prefix for index matches. Many DBMS platforms will optimise this and use an index (if available) to resolve the prefix. However, this is not anything like as clever as a full regex, and the indexing can only be used if you have a constant prefix.The even longer answer is that there are algorithms such as RETE that will optimise partial matches like this. This might be applicable if you can express your matches as forward-chaining production rules rather than regular expressions.
Rete works by computing partial matches and only presenting rules that can be reached from this partial match, so it is more efficient than O(n) (more like O(log n) but I’m not sure of the exact time complexity) for matching n rules against a fact.