I am making a regex expression in which I only want to match wrong tags like: <p> *some text here, some other tags may be here as well but no ending 'p' tag* </p>
<P>Affectionately Inscribed </P><P>TO </P><P>HENRY BULLAR, </P><P>(of the western circuit)<P>PREFACE</P>
In the above same text I want to get the result as <P>(of the western circuit)<P> and nothing else should be captured. I’m using this but its not working:
<P>[^\(</P>\)]*<P>
Please help.
Regex is not always a good choice for xml/html type data. In particular, attributes, case-sensitivity, comments, etc all have a big impact.
For xhtml, I’d use
XmlDocument/XDocumentand an xpath query.For ‘non-x’ html, I’d look at the HTML Agility Pack and the same.