We’re using Jsoup.clean(String, Whitelist) to process some input, and it appears that Jsoup is adding an extraneous line break just prior to acceptable tags. I’ve seen a few people post this issue around the internet, but haven’t been able to track down a solution.
For instance, let’s say we have a very simple string with some bold tags within it, like so:
String htmlToClean = "This is a line with <b>bold text</b> within it."
String returnString = Jsoup.clean(htmlToClean, Whitelist.relaxed());
System.out.println(returnString);
What comes out of the call to the clean() method is something like so:
This is a line with \n<b>bold text</b> within it.
Notice that extraneous “\n” appended just prior to the opening bold tag. I can’t seem to track down in the source where this is being appended (although admittedly I’m new to Jsoup).
Has anyone encountered this problem, and better yet, have found some way to avoid this extra, unwanted character to be appended to the string in this way?
Hmm… have not seen any options for this.
If you parse the html in
Documentyou have some output settings:With
prettyPrintoff you’ll get the following output:This is a line with <b>bold text</b> within it.Maybe you can write your own
clean()method, since the implemented one usesesDocument‘s (there’ you can disableprettyPrint):Orginal methods: