I am trying to come up with a way to match content that does not exist inside any xml or html tags. I’ve read that using regular expressions is fundamentally bad for parsing xml/html, and I’m open for any solution that will solve my problem, but if a regex works too all the better.
Here’s an example of what I’m looking for:
the lazy fox jumped <span>over</span> the brown fence.
What I want back is
the lazy fox jumped the brown fence
Any ideas?
It’s probably a naive technique, but my first instinct would be to run the regular expression, figure out what text it matches within your parent string, and REMOVE it from that string, returning the remainder. In pseudocode,