Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6591223
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T17:23:52+00:00 2026-05-25T17:23:52+00:00

First thing first, I know there are many questions regarding python and lxml on

  • 0

First thing first, I know there are many questions regarding python and lxml on StackOverflow already, and I did read most of them, if not all. Right now I am looking for a more comprehensive answer in this question.

I am doing some HTML conversion and I need to grammatically parse the HTML and then do some content changes to href, img and such.

This is a simplified version of what I have right now:

with open(fileName, "r") as inFile:
    inputS = inFile.read()

myTree = fromstring(inputS) #parse etree from HTML content

breadCrumb = myTree.get_element_by_id("breadcrumb") #a list of elements with matching id
breadCrumbContent = breadCrumb[0].text_content().strip() #text content of bread crumb

h1 = myTree.xpath('//h1') #another way, get elements by xpath
h1Content = h1[0].text_content().strip() #get text content

getTail = myTree.cssselect('table.results > tr > td > a + span + br') #get list of elements using css select

So basically that’s what I know at the moment. Is there any other ways to get elements/attributes using lxml? I know that they may not be the best way to do it but bear with me, i am new to this whole thing.

Following is what I want to do. I have:

<img src="images/macmail10.gif" alt="" width="555" height="485" /><br />
<a href="http://www.some_url.com/faq/general_faq.html" target="_blank">General FAQs page</a>

They can be nested inside other elements like div, p whatsoever. What I want to do is to programatically look for those elements; for image, I want to extract the src, do some manipulation with it and set src to something else (for example, src="images/something.jpg" into src="something_images.jpg"), the same thing with href, i want to change it to make it point to other place.

Other than that, I also want to remove some elements from the tree to simplify it, for example:

<head>
    <title>something goes here</title>
</head>
<div>
    <p id="some_p"> Some content </p>
</div>

I would want to remove the head node and the div, i would be able to get the p with id="some_p", is there any ways to grab the parent element? is there also any way to remove those elements? (in this case look for head, remove head and then look for id="some_p", get the parent and delete it.

Thank you!

==================================================

UPDATE: I already found the solution to this and already finished coding using lxml.etree. I will post the answer to that as soon as stackoverflow allows me. I truly hope that the answer for this question would be of help to other people when they have to deal with HTML parsing!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T17:23:52+00:00Added an answer on May 25, 2026 at 5:23 pm

    lxml and ElementTree are quite similar. The ElementTree portion of the lxml documentation site, in fact, just points to ElementTree’s documentation.

    You might try working through the ElementTree tutorials and examples at the bottom of the overview page. Since ElementTree is part of the Python distribution, it tends to be widely documented (and easily Googled). Once you grok that, extend with some of the lmlx magic not initial found in ElementTree if you need to. For example, lxml maintains parent relationships for every element and ElementTree does not. You can add parent relationships to ElementTree, but it is not an easy example to start with.

    That how I learned it.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I've done many web apps where the first thing you do is make a
There are way too many questions and answers about this basic functionality, I cannot
First off, I'm a beginner in mobile development. I've seen many similar questions but
The first thing I do in a public method is to validate every single
I am calling a csh script that the first thing it does is starts
I'm setting up a public site and the first thing on my mind is
I am trying to troubleshoot an MS Access XP database and the first thing
I've just upgraded a native C++ project from VS2005-SP1 to VS2008-SP1 The first thing
NOTE: I am not set on using VI, it is just the first thing
First of all, i know this question has been sort of asked/sort-of answered here:

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.