Possible Duplicate:
How to remove HTML tag in Java
RegEx match open tags except XHTML self-contained tags
I want to remove specific HTML tag with its content.
For example, if the html is:
<span style='font-family:Verdana;mso-bidi-font-family:
"Times New Roman";display:none;mso-hide:all'>contents</span>
If the tag contains “mso-*”, it must remove the whole tag (opening, closing and content).
As Dave Newton pointed out in his comment, a html parser is the way to go here. If you really want to do it the hard way, here’s a regex that works: