I’ve got a task in which I should capture specified text which cannot be between headline (of any size) and anchor html tags (<h*></h*> and <a></a>) nor inside tag as attribute.
For example I’ve got text:
<h1>TfL</h1>
<a href="tfl.gov.uk">Tfl</a>
TfL is official organization for keeping London moving.
Is it possible to match “TfL” only outside those tags using regular expressions?
Many thanks.
Peter.
I ended up with selecting nodes using HtmlAgilityPack.HtmlDocument.SelectNodes() and then checking the node in selection if it is excluded tag or not and has or has not such parent (recursively).
IsNotOrNestedInSpecifiedNode function used in code above: