Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9256339
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T11:55:02+00:00 2026-06-18T11:55:02+00:00

I am evaluating jsoup for the functionality which would sanitize (but not remove!) the

  • 0

I am evaluating jsoup for the functionality which would sanitize (but not remove!) the non-whitelisted tags. Let’s say only <b> tag is allowed, so the following input

foo <b>bar</b> <script onLoad='stealYourCookies();'>baz</script>

has to yield the following:

foo <b>bar</b> &lt;script onLoad='stealYourCookies();'&gt;baz&lt;/script&gt;

I see the following problems/questions with jsoup:

  • document.getAllElements() always assumes <html>, <head> and <body>. Yes, I can call document.body().getAllElements() but the point is that I don’t know if my source is a full HTML document or just the body — and I want the result in the same shape and form as it came in;
  • how do I replace <script>...</script> with &lt;script&gt;...&lt;/script&gt;? I only want to replace brackets with escaped entities and do not want to alter any attributes, etc. Node.replaceWith sounds like an overkill for this.
  • Is it possible to completely switch off pretty printing (e.g. insertion of new lines, etc.)?

Or maybe I should use another framework? I have peeked at htmlcleaner so far, but the given examples don’t suggest my desired functionality is supported.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T11:55:03+00:00Added an answer on June 18, 2026 at 11:55 am

    Answer 1

    How do you load / parse your Document with Jsoup? If you use parse() or connect().get() jsoup will automaticly format your html (inserting html, body and head tags). This this ensures you always have a complete Html document – even if input isnt complete.

    Let’s assume you only want to clean an input (no furhter processing) you should use clean() instead the previous listed methods.

    Example 1 – Using parse()

    final String html = "<b>a</b>";
    
    System.out.println(Jsoup.parse(html));
    

    Output:

    <html>
     <head></head>
     <body>
      <b>a</b>
     </body>
    </html>
    

    Input html is completed to ensure you have a complete document.

    Example 2 – Using clean()

    final String html = "<b>a</b>";
    
    System.out.println(Jsoup.clean("<b>a</b>", Whitelist.relaxed()));
    

    Output:

    <b>a</b>
    

    Input html is cleaned, not more.

    Documentation:

    • Jsoup

    Answer 2

    The method replaceWith() does exactly what you need:

    Example:

    final String html = "<b><script>your script here</script></b>";
    Document doc = Jsoup.parse(html);
    
    for( Element element : doc.select("script") )
    {
        element.replaceWith(TextNode.createFromEncoded(element.toString(), null));
    }
    
    System.out.println(doc);
    

    Output:

    <html>
     <head></head>
     <body>
      <b>&lt;script&gt;your script here&lt;/script&gt;</b>
     </body>
    </html>
    

    Or body only:

    System.out.println(doc.body().html());
    

    Output:

    <b>&lt;script&gt;your script here&lt;/script&gt;</b>
    

    Documentation:

    • Node.replaceWith(Node in)
    • TextNode

    Answer 3

    Yes, prettyPrint() method of Jsoup.OutputSettings does this.

    Example:

    final String html = "<p>your html here</p>";
    
    Document doc = Jsoup.parse(html);
    doc.outputSettings().prettyPrint(false);
    
    System.out.println(doc);
    

    Note: if the outputSettings() method is not available, please update Jsoup.

    Output:

    <html><head></head><body><p>your html here</p></body></html>
    

    Documentation:

    • Document.OutputSettings.prettyPrint(boolean pretty)

    Answer 4 (no bullet)

    No! Jsoup is one of the best and most capable Html library out there!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm evaluating Cruise (NOTE! Not CruiseControl but Cruise (commercial)) for doing CI with an
When evaluating dojo.require statements, dojo tracks which modules and resources have been required and
I am evaluating one QuerySet, and then another, but the second is a subset
We are evaluating pursuing Storm for a deployment, but I am a little concerned.
I'm evaluating whether WF would be a good fit for a design I'm working
I'm interested in evaluating bug trackers, but I wanted to back up and figure
I'm evaluating the AtomineerUtils addin (which allows Visual Studio to auto-enter doxygen comment blocks).
I'm evaluating existing resources for script loading optimization, but I readed in some articles
I'm evaluating whether or not certain variables match expected values. The variables are set
Say I'm evaluating some text classification research project using two approaches 'A' and 'B'.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.