I am trying to take a string that has HTML, strip out some tags (img, object) and all other HTML tags, strip out their attributes. For example:
<div id="someId" style="color: #000000">
<p class="someClass">Some Text</p>
<img src="images/someimage.jpg" alt="" />
<a href="somelink.html">Some Link Text</a>
</div>
Would become:
<div>
<p>Some Text</p>
Some Link Text
</div>
I am trying:
string.replaceAll("<\/?[img|object](\s\w+(\=\".*\")?)*\>", ""); //REMOVE img/object
I am not sure how to strip all attributes inside a tag though.
Any help would be appreciated.
Thanks.
You can remove all attributes like this:
This expression matches an opening tag, but captures only its header
<divand the closing>as groups 1 and 2.replaceAlluses references to these groups to join them back in the output as$1$2. This cuts out the attributes in the middle of the tag.