I’m trying to find a single regular expression that I can use to parse a block of HTML to find some specific text, but only if that text is not part of an existing hyperlink. I want to turn the non-links into links, which is easy, but identifying the non-linked ones with a single expression seems more troublesome. In the following example:
This problem is a result of BugID 12.
If you want more information, refer to <a href="/bug.aspx?id=12">BugID 12</a>.
I want a single expression to find “BugID 12” so I can link it, but I don’t want to match the second one because it’s already linked.
In case it matters, I’m using .NET’s regular expressions.
If .Net supports negative look aheads (which I think it does):
However, there is still the danger that BugID 12 will be inside an anchor like
But you can mostly overcome this with
Disclaimer: Parsing html with regex is not reliable and should only be done as a last resort, or in the most simple of cases. I’m sure there are plenty of instances where the above expression does not perform as desired. (example:
BugID 12</span></a>)