Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6941805
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T12:55:58+00:00 2026-05-27T12:55:58+00:00

I am parsing an html document using the http://lxml.de/ library. So far I have

  • 0

I am parsing an html document using the http://lxml.de/ library. So far I have figured out how to strip tags from an html document In lxml, how do I remove a tag but retain all contents? but the method described in that post leaves all the text, stripping the tags with out removing the actual script. I have also found a class reference to lxml.html.clean.Cleaner http://lxml.de/api/lxml.html.clean.Cleaner-class.html but this is clear as mud as to how to actually use the class to clean the document. Any help, perhaps a short example would be helpful to me!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T12:55:59+00:00Added an answer on May 27, 2026 at 12:55 pm

    Below is an example to do what you want. For an HTML document, Cleaner is a better general solution to the problem than using strip_elements, because in cases like this you want to strip out more than just the <script> tag; you also want to get rid of things like onclick=function() attributes on other tags.

    #!/usr/bin/env python
    
    import lxml
    from lxml.html.clean import Cleaner
    
    cleaner = Cleaner()
    cleaner.javascript = True # This is True because we want to activate the javascript filter
    cleaner.style = True      # This is True because we want to activate the styles & stylesheet filter
    
    print("WITH JAVASCRIPT & STYLES")
    print(lxml.html.tostring(lxml.html.parse('http://www.google.com')))
    print("WITHOUT JAVASCRIPT & STYLES")
    print(lxml.html.tostring(cleaner.clean_html(lxml.html.parse('http://www.google.com'))))
    

    You can get a list of the options you can set in the lxml.html.clean.Cleaner documentation; some options you can just set to True or False (the default) and others take a list like:

    cleaner.kill_tags = ['a', 'h1']
    cleaner.remove_tags = ['p']
    

    Note that the difference between kill vs remove:

    remove_tags:
      A list of tags to remove. Only the tags will be removed, their content will get pulled up into the parent tag.
    kill_tags:
      A list of tags to kill. Killing also removes the tag's content, i.e. the whole subtree, not just the tag itself.
    allow_tags:
      A list of tags to include (default include all).
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Currently I'm parsing a HTML document using Nokogiri and iterating through all the code
I'm parsing a html document using HTMLParser and I want to print the contents
I'm parsing an HTML file into a well-formed XML document using NekoHTML parser. However
I am parsing a HTML document with XPATH and I want to keep all
I have a console application which is parsing HTML documents via the WebRequest method
I am parsing some HTML source. Is there a regex script to find out
Can anyone recommend a C or Objective-C library for HTML parsing? It needs to
I am parsing the tabular information from the html file with the help of
I'm new to DOM parsing in PHP: I have a HTML file that I'm
I am using the following code for parsing dom document but at the end

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.