I’m looking to match all text in the format foo:12345 that is not contained within an HTML anchor. For example, I’d like to match lines 1 and 3 from the following:
foo:123456
<a href="http://www.google.com">foo:123456</a>
foo:123456
I’ve tried these regexes with no success:
Negative lookahead attempt ( incorrectly matches, but doesn’t include the last digit )
foo:(\d+)(?!</a>)
Negative lookahead with non-capturing grouping
(?:foo:(\d+))(?!</a>)
Negative lookbehind attempt ( wildcards don’t seem to be supported )
(?<!<a[^>]>)foo:(\d+)
Regex is usually not the best tool for the job, but if your case is very specific like in your example you could use:
Your first expression didn’t work because
\d+would backtrack till(?!</a>)matches. This can be fixed by not allowing\d+to backtrack, as above with help of an atomic/nonbacktracking group, or you could also make the lookahead fail in case\d+backtracks, like:Altho that is not as efficient.