Trying to answer this question, I created this Python regular expression to match any egg substring followed by a digit that is not part of a URL preceded by http://:
>>> r = re.compile('(?:\s(?!http://\S*))egg\d')
Then I applied it to the following string:
>>> a = "a egg1 http://egg2.com egg3 http://www.egg4.org egg5"
The result is:
>>> r.findall(a)
[' egg1', ' egg3', ' egg5']
The regular expression is not correct for a lot of other problems but one bugged more: why does the whitespace appears in the result? Since I used a lookahead assertion like (?:\s...), shouldn’t it be take out of the resulting strings?
(?:...)isn’t a lookahead assertion, it’s simply a non-capturing pair of parens (i.e. what is matched by the sub-regex inside doesn’t do into its own group, it only exists for precedence).(?=...)is a lookahead assertion.