Basically I have an HTML fragment with <br> and <p></p> inside. I was able to remove all the HTML tags but doing so leaves the text in a bad format.
I want something like nl2br() in PHP except reverse the input and output and also takes into account <p> tags. is there a library for it in Java?
You basically need to replace each
<br>with\nand each<p>with\n\n. So, at the points where you succeed to remove them, you need to insert the\nand\n\nrespectively.Here’s a kickoff example with help of the Jsoup HTML parser (the HTML example is intentionally written that way so that it’s hard if not nearly impossible to use regex for this).
(note:
replaceAll()is unnecessary as we just want a simple charsequence-by-charsequence replacement here, not regexpattern-by-charsequence replacement)Output:
A bit hacky, but it works.