In my work I have with great results used approximate string matching algorithms such as Damerau–Levenshtein distance to make my code less vulnerable to spelling mistakes.
Now I have a need to match strings against simple regular expressions such TV Schedule for \d\d (Jan|Feb|Mar|...). This means that the string TV Schedule for 10 Jan should return 0 while T Schedule for 10. Jan should return 2.
This could be done by generating all strings in the regex (in this case 100×12) and find the best match, but that doesn’t seam practical.
Do you have any ideas how to do this effectively?
I found the TRE library, which seems to be able to do exactly fuzzy matching of regular expressions. Example: http://hackerboss.com/approximate-regex-matching-in-python/
It only supports insertion, deletion and substitution though. No transposition. But I guess that works ok.
I tried the accompanying agrep tool with the regexp on the following file:
and got
Thanks a lot for all your suggestions.