I am using a regex to remove HTML tags. I do something like –
result.replaceAll(“\<.*?\>”, “”);
However, it does not help me get rid of the img tags in the html. Any idea what is a good way to do that?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
To give a more concrete recommendation, use JSoup (or NekoHTML) to parse the HTML into a Java object.
Once you’ve got a
Documentobject it can easily be traversed to remove the tags. This cookbook recipe shows how to get attributes and text from the DOM.