For example, I have this string: "http://www.google.com/this_is_our_network/"
I want to match against the word “work” but with no alphabetic character in the start or in the end. In the above example the regex should not return a match.
But, in this string: "http://www.google.com/work_for_us.html" the regex should come up with a match since there is no alphabetic character in the start nor in the end.
Try this regex:
(?<=[\W_])work(?=[\W_])This uses positive look-ahead and look-behind assertions to respect enclosing characters but without including them in the match.
This regex matches
work\Wcharacter or an underscoreAND
\Wcharacter or an underscore.\bfor word boundary matching can’t be used since_matches\wwhich is not wanted here.Further examples:
Matching multiple words:
(?<=[\W_])(work|job)(?=[\W_])Same as above but without creating submatches:
(?<=[\W_])(?:work|job)(?=[\W_])Also respecting line end:
(?<=[\W_])(?:work|job)(?=[\W_]|$)Some useful notes regarding regex syntax:
\wmatches all alphanumeric characters and underscore; this is equivalent to[a-zA-Z0-9_]\Wmatches the exact opposite of\w\bmatches boundaries between a\wand a\Wcharacter (or vise-versa)Positive look-ahead assertion:
foo(?=bar)matchesfoofollowed bybar, without includingbarin the match.Positive look-behind assertion:
(?<=foo)barmatchesbarif it followsfoo, without includingfooin the match.For further information on (python) regex syntax consider the python regex docs or the perl regex docs. Also, the web-based Python Regex Tool is handy for testing.