Question:
How do I use Python’s regular expression module (re) to determine if a match has been made, or that a potential match could be made?
Details:
I want a regex pattern which searches for a pattern of words in a correct order regardless of what’s between them. I want a function which returns Yes if found, Maybe if a match could still be found or No if no match can be found. We are looking for the pattern One|....|Two|....|Three, here are some examples (Note the names, their count, or their order are not important, all I care about is the three words One, Two and Three, and the acceptable words in between are John, Malkovich, Stamos and Travolta).
Returns YES:
One|John|Malkovich|Two|John|Stamos|Three|John|Travolta
Returns YES:
One|John|Two|John|Three|John
Returns YES:
One|Two|Three
Returns MAYBE:
One|Two
Returns MAYBE:
One
Returns NO:
Three|Two|One
I understand the examples are not airtight, so here is what I have for the regex to get YES:
if re.match('One\|(John\||Malkovich\||Stamos\||Travolta\|)*Two\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
return 'Yes'
Obviously if the pattern is Three|Two|One the above will fail, and we can return No, but how do I check for the Maybe case? I thought about nesting the parentheses, like so (note, not tested)
if re.match('One\|((John\||Malkovich\||Stamos\||Travolta\|)*Two(\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*)*)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
return 'Yes'
But I don’t think that will do what I want it to do.
More Details:
I am not actually looking for Travoltas and Malkovichs (shocking, I know). I am matching against inotify Patterns such as IN_MOVE, IN_CREATE, IN_OPEN, and I am logging them and getting hundreds of them, then I go in and then look for a particular pattern such as IN_ACCESS…IN_OPEN….IN_MODIFY, but in some cases I don’t want an IN_DELETE after the IN_OPEN and in others I do. I’m essentially pattern matching to use inotify to detect when text editors gone wild and they try to crush programmers souls by doing a temporary-file-swap-save instead of just modifying the file. I don’t want to free up those logs instantly, but I only want to hold on to them for as long as is necessary. Maybe means dont erase the logs. Yes means do something then erase the log and No means don’t do anything but still erase the logs. As I will have multiple rules for each program (ie. vim v gedit v emacs) I wanted to use a regular expression which would be more human readable and easier to write then creating a massive tree, or as user Joel suggested, just going over the words with a loop
Perhaps an algorithm like this would be more appropriate. Here is some pseudocode.