Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8245711
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T22:16:35+00:00 2026-06-07T22:16:35+00:00

I’d like to parse and compare 2 XML files with the Python Etree parser

  • 0

I’d like to parse and compare 2 XML files with the Python Etree parser as follows:

I have 2 XML files with loads of data. One is in English (the source file), the other one the corresponding French translation (the target file).
E.g.:

source file:

<AB>
  <CD/>
  <EF>

    <GH>
      <id>123</id>
      <IJ>xyz</IJ>
      <KL>DOG</KL>
      <MN>dogs/dog</MN>
      some more tags and info on same level
      <metadata>
        <entry>
           <cl>Translation</cl>
           <cl>English:dog/dogs</cl>
        </entry>
        <entry>
           <string>blabla</string>
           <string>blabla</string>
        </entry>
            some more strings and entries
      </metadata>
    </GH>

  </EF>
  <stuff/>
  <morestuff/>
  <otherstuff/>
  <stuffstuff/>
  <blubb/>
  <bla/>
  <blubbbla>8</blubbla>
</AB>

The target file looks exactly the same, but has no text at some places:

<MN>chiens/chien</MN>
some more tags and info on same level
<metadata>
  <entry>
    <cl>Translation</cl>
    <cl></cl>
  </entry>

The French target file has an empty cross-language reference where I’d like to put in the information from the English source file whenever the 2 macros have the same ID.
I already wrote some code in which I replaced the string tag name with a unique tag name in order to identify the cross-language reference. Now I want to compare the 2 files and if 2 macros have the same ID, exchange the empty reference in the French file with the info from the English file. I was trying out the minidom parser before but got stuck and would like to try Etree now. I have hardly any knowledge about programming and find this very hard.
Here is the code I have so far:

    macros = ElementTree.parse(english)

    for tag in macros.getchildren('macro'):
        id_ = tag.find('id')
        data = tag.find('cl')
        id_dict[id_.text] = data.text

    macros = ElementTree.parse(french)

    for tag in macros.getchildren('macro'):
        id_ = tag.find('id')
        target = tag.find('cl')
        if target.text.strip() == '':
        target.text = id_dict[id_.text]

    print (ElementTree.tostring(macros))

I am more than clueless and reading other posts on this confuses me even more. I’d appreciate it very much if someone could enlighten me 🙂

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T22:16:36+00:00Added an answer on June 7, 2026 at 10:16 pm

    There is probably more details to be clarified. Here is the sample with some debug prints that shows the idea. It assumes that both files have exactly the same structure, and that you want to go only one level below the root:

    import xml.etree.ElementTree as etree
    
    english_tree = etree.parse('en.xml')
    french_tree = etree.parse('fr.xml')
    
    # Get the root elements, as they support iteration
    # through their children (direct descendants)
    english_root = english_tree.getroot()
    french_root = french_tree.getroot()
    
    # Iterate through the direct descendants of the root
    # elements in both trees in parallel.
    for en, fr in zip(english_root, french_root):
       assert en.tag == fr.tag # check for the same structure
       if en.tag == 'id':
           assert en.text == fr.text # check for the same id
    
       elif en.tag == 'string':
           if fr.text is None:
               fr.text = en.text
               print en.text      # displaying what was replaced
    
    etree.dump(french_tree)
    

    For more complex structures of the file, the loop through the direct children of the node can be replaced by iteration through all the elements of the tree. If the structures of the files are exactly the same, the following code will work:

    import xml.etree.ElementTree as etree
    
    english_tree = etree.parse('en.xml')
    french_tree = etree.parse('fr.xml')
    
    for en, fr in zip(english_tree.iter(), french_tree.iter()):
       assert en.tag == fr.tag        # check if the structure is the same
       if en.tag == 'id':
           assert en.text == fr.text  # identification must be the same
       elif en.tag == 'string':
           if fr.text is None:
               fr.text = en.text
               print en.text          # display the inserted text
    
    # Write the result to the output file.
    with open('fr2.xml', 'w') as fout:
        fout.write(etree.tostring(french_tree.getroot()))
    

    However, it works only in cases when both files have exactly the same structure. Let’s follow the algorithm that would be used when the task is to be done manually. Firstly, we need to find the French translation that is empty. Then it should be replaced by the English translation from the GH element with the same identification. A subset of XPath expressions is used in the case when searching for the elements:

    import xml.etree.ElementTree as etree
    
    def find_translation(tree, id_):
        # Search fot the GH element with the given identification, and return
        # its translation if found. Otherwise None is returned implicitly.
        for gh in tree.iter('GH'):
           id_elem = gh.find('./id')
           if id_ == id_elem.text:
               # The related GH element found.
               # Find metadata entry, extract the translation.
               # Warning! This is simplification for the fixed position 
               # of the Translation entry.
               me = gh.find('./metadata/entry')
               assert len(me) == 2     # metadata/entry has two elements
               cl1 = me[0]
               assert cl1.text == 'Translation'
               cl2 = me[1]
    
               return cl2.text
    
    
    # Body of the program. --------------------------------------------------
    
    english_tree = etree.parse('en.xml')
    french_tree = etree.parse('fr.xml')
    
    for gh in french_tree.iter('GH'): # iterate through the GH elements only 
       # Get the identification of the GH section
       id_elem = gh.find('./id')      
       id_ = id_elem.text
    
       # Find and check the metadata entry, extract the French translation.
       # Warning! This is simplification for the fixed position of the Translation 
       # entry.
       me = gh.find('./metadata/entry')
       assert len(me) == 2     # metadata/entry has two elements
       cl1 = me[0]
       assert cl1.text == 'Translation'
       cl2 = me[1]
       fr_translation = cl2.text
    
       # If the French translation is empty, put there the English translation
       # from the related element.
       if cl2.text is None:
           cl2.text = find_translation(english_tree, id_)
    
    
    with open('fr2.xml', 'w') as fout:
       fout.write(etree.tostring(french_tree.getroot()).decode('utf-8'))
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I have a .ini file as follows: [playlist] numberofentries=2 File1=http://87.230.82.17:80 Title1=(#1 - 365/1400) Example
I have a French site that I want to parse, but am running into
I have this code: - (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock { NSString *someString = [[NSString
I have thousands of HTML files to process using Groovy/Java and I need to
I have two tables with like below codes: Table: Accounts id | username |
I have a bunch of posts stored in text files formatted in yaml/textile (from
I would like to count the length of a string with PHP. The string
For some reason, after submitting a string like this Jack’s Spindle from a text
link Im having trouble converting the html entites into html characters, (&# 8217;) i

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.