I’m cleaning some text from unwanted HTML tags (such as <script>) by using
String clean = Jsoup.clean(someInput, Whitelist.basicWithImages());
The problem is that it replaces for instance å with å (which causes troubles for me since it’s not “pure xml”).
For example
Jsoup.clean("hello å <script></script> world", Whitelist.basicWithImages())
yields
"hello å world"
but I would like
"hello å world"
Is there a simple way to achieve this? (I.e. simpler than converting å back to å in the result.)
You can configure Jsoup’s escaping mode: Using
EscapeMode.xhtmlwill give you output w/o entities.Here’s a complete snippet that accepts
stras input, and cleans it usingWhitelist.simpleText():