I have some SGML that I’m trying to clean up by adding closing tags to the opening ones. Right now, the document has a structure like this:
<CAT>
<NAME>Daniel
<COLOR>White
<DESC>Daniel is a white cat <p>He was born in July</p><br />He's super cute.<p><br />He does not have any siblings.
<COUNTRY>USA
</CAT>
So far I can match an open tag and capture the content as a group using this regexp: if doesn’t have any
<NAME>([^\\<]+)[^<]<p>, </p>, or <br /> elements within the content area.
But if i do , the pattern matching stops right before the first
<DESC>([^\\<]+)[^<]<p>
The reason why I’m using < as the end of the pattern is because all the other open nodes don’t have html elements that stop the matching
How can I make a regexp that matches the <DESC> node that includes <p>, </p>, <br /> and ends before the <COUNTRY> node?
How about this:
This allows these three tags to match and stops at the next
<that doesn’t belong to one of the three.By the way, why aren’t you allowing the backslash as a valid character?