I am trying to match part of a file path if it does not include a certain keyword using regular expressions in python. For example, applying the regular expression to “/exclude/this/test/other” should not match, whereas “/this/test/other” should return the file path excluding “other”, i.e. “/this/test”, and where “other” is any directory. So far I am using this
In [153]: re.findall("^(((?!exclude).)*(?=test).*)?", "/exclude/this/test/other")
Out[153]: [('', '')]
re.findall("^(((?!exclude).)*(?=test).*)?", "/this/test/other")
Out[152]: [('/this/test/other', '/')]
but I can’t get it to stop matching after “test”, also there are some empty matches. Any ideas?
You’re getting the extra result because (1) you’re using
findall()instead ofsearch(), and (2) you’re using capturing groups instead of non-capturingThis will work with
findall()too, but that doesn’t really make sense when you’re matching the whole string. More importantly, the include part of your regex doesn’t work. Check this:That’s because the
*in(?=test)*makes the lookahead optional, which makes it pointless. But getting rid of the*isn’t really a solution, becauseexcludeandtestmight be part of longer words, likeexcludexxoryyytest. Here’s a better regex:tested:
EDIT: I see you fixed the “optional lookahead” problem, but now the whole regex is optional!
EDIT: If you want it to stop matching after
/test, try this:(?:/(?!test\b|exclude\b)\w+)*matches zero or more path components, as long as they’re not/testor/exclude.