Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3362372
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T03:15:58+00:00 2026-05-18T03:15:58+00:00

I have some sgml files that are roughly standardized. However, there can be data

  • 0

I have some sgml files that are roughly standardized. However, there can be data contained within a tag that I do not know exists before I open the file and personally read it. For example, the files have addresses and generally the addresses have a street, a city, a state, a zip and a phone. Each element of the address is indicated with a tag

 <ADDRESS>
 <STREET>One Main Street
 <CITY>Gotham City
 <ZIP>99999 0123
 <PHONE>555-123-5467
 </ADDRESS>

But, for example, I have discovered that there are tags for Country, STREET1, STREET2. I have over 200K files to process and I want know if it is possible to pull out all of the elements of the addresses without having to worry about knowing the existence of unknown tags.

What I have done so far is

h=fromstring(my_data_in_a_string)
for each in h.cssselect('mail_address'):
    each.text_content()

but what I get is problematic because I can’t identify where one element ends and the next begins

One Main StreetGotham City99999 0123555-123-5467
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T03:15:59+00:00Added an answer on May 18, 2026 at 3:15 am

    To get all the tags, we iter through the document like this:

    Suppose your XML structure is like this:

    <ADDRESS>
     <STREET>One Main Street</STREET>
     <CITY>Gotham City</CITY>
     <ZIP>99999 0123</ZIP>
     <PHONE>555-123-5467</PHONE>
     </ADDRESS>
    

    We parse it:

    >>> from lxml import etree
    >>> f = etree.parse('foo.xml')  # path to XML file
    >>> root = f.getroot() # get the root element
    >>> for tags in root.iter(): # iter through the root element
    ...     print tags.tag       # print all the tags
    ... 
    ADDRESS
    STREET
    CITY
    ZIP
    PHONE
    

    Now suppose your XML has extra tags as well; tags you are not aware about. Since we are iterating through the XML, the above code will return those tags as well.

    <ADDRESS>
             <STREET>One Main Street</STREET>
             <STREET1>One Second Street</STREET1>
            <CITY>Gotham City</CITY>
             <ZIP>99999 0123</ZIP>
             <PHONE>555-123-5467</PHONE>         
             <COUNTRY>USA</COUNTRY>    
    </ADDRESS>
    

    The above code returns:

    ADDRESS
    STREET
    STREET1
    CITY
    ZIP
    PHONE
    COUNTRY
    

    Now if we want to get the text of the tags, the procedure is the same. Just print tag.text like this:

    >>> for tags in root.iter():
    ...     print tags.text
    ... 
    
    One Main Street
    One Second Street
    Gotham City
    99999 0123
    555-123-5467
    USA
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Have some audio and video files that users are to download, however depending on
I have some arbitrary pixel data that I want to save as a PNG.
Have some data in a sybase image type column that I want to use
I have some code that can convert a single color in a template image
I have some values that are stored with core data, and I have opened
I have some input values contained within a bunch of divs and a form.
I have thousands of SGML documents, some well-formed, some not so well-formed. I need
I have some json formatted data that I parse with JSON.parse. The problem I
Have some JavaScript that assigns values to an object using $.data and then uses
We have some input data that sometimes appears with &nbsp characters on the end.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.