I know that using Regular expressions to parse or manipulate HTML/XML is a bad idea and I usually would never do it. But considering it because of lack of alternatives.
I need to replace text inside a string that is not already part of a tag (ideally a span tag with specific id) using C#.
For example, Lets say I want to replace all instaces of ABC in the following text that are not inside a span with Alternate text (another span in my case)
ABC at start of line or ABC here must be replaced but, <span id="__publishingReusableFragment" >ABC inside span must not be replaced with anything. Another ABC here </span> this ABC must also be replaced
I tried using regex with both look ahead and look behind assertion. Various combinations along the lines of
string regexPattern = "(?<!id=\"__publishingReusableFragment\").*?" + stringToMatch + ".*?(?!span)";
but gave up on that.
I tried loading it into an XElement and trying to create a writer from there and getting text not inside of a node. But couldn’t figure that out either.
XElement xel = XElement.Parse("<payload>" + inputString + @"</payload>");
XmlWriter requiredWriter = xel.CreateWriter();
I am hoping somehow to use the writer to get the strings that are not part of a node and replacing them.
Basically I am open to any suggestions/solutions to solve this problem.
Thanks in advance for the help.
will work with all the caveats about HTML parsing (that you seem to know, so I won’t repeat them here) still valid.
The regex matches
ABCif it’s not preceded by an opening<span id=__publishingReusableFragment">tag and if there is no closing<span>tag between the two. It will obviously fail if there can be nested<span>tags.