I have a fairly long and complex HTML document, and I need to find all occurences of a given string, e.g. “foobar”, unless it’s between <a> and </a> anchor tags.
The trouble is: it could be inside some text between the anchor tags, e.g.
<a>this is a foobar test</a>
and even in this case, I should not find the match.
How can I do that with a regex?? I would have no trouble finding <a>foobar</a> and so on – but finding every “foobar” except when it’s between the anchor tags and surrounded by possible a lot of other text seems a bit tricky…..
Any ideas??
ANSWER:
We ended up using this Regex to solve this problem – just in case anyone is a) curious, or b) in the same place 🙂
(?<!\<A.*(?=\<\/A))Test(?!\<\/A.*(?=\<A))
works for me in the simplest case. it’s obviously not resistant to having other tags within
atag.