My input is plain text string and requirement is to remove all html tags except few specific tags like:
<p>
<li>
<u>
<li>
If these specific tags have attributes like class or id, I want to remove these attributes.
A few examples:
<a href = "#">Link</a> -> Link
<p>paragraph</p> -> <p>paragraph</p>
<p class="class1">paragraph</p> -> <p>paragraph</p>
I have gone through this Remove HTML tags from a String but it does not answer my question completely.
Can it be handled by a set of regex’s or could I make use of some library?
I tried JSoup and It seems to be able to handle all such cases. Here is example code.
For input string
I get following output which is pretty much I require.