Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8144109
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T13:17:00+00:00 2026-06-06T13:17:00+00:00

In answering another question, someone showed me the following tutorial, in which the author

  • 0

In answering another question, someone showed me the following tutorial, in which the author claims to have used iterparse to parse a ~100 MB XML file in under 3 seconds:

http://eli.thegreenplace.net/2012/03/15/processing-xml-in-python-with-elementtree/

I am trying to parse an ~90 MB XML file, and I have the following code:

from xml.etree.cElementTree import *
count = 0

for event, elem in iterparse('foo.xml'):        
    if elem.tag == 'identifier' and elem.text == 'bar':
        count += 1
    elem.clear() # discard the element

print count

It is taking about thirty seconds… not even the same order of magnitude as reported in the tutorial I read using a similarly sized file, a similar algorithm, and the same package.

Could someone please inform me what might be wrong with my code, or what differences I might not be noticing between my situation and the tutorial?

I am using Python 2.7.3.

Addendum:

I am also using a reasonably powerful machine, in case anyone thinks that might be it.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T13:17:01+00:00Added an answer on June 6, 2026 at 1:17 pm

    As TJD mentioned, comparing XMLs in size only may not be very informative. However, I happen to have files of the same structure but different size:

    With a 79M file:

    $ python -m timeit -n 1 -c 'from xml.etree.cElementTree import iterparse
    count = 0
    for event, elem in iterparse("..../QT20060217_S_18mix23-2500_01.mzML"):
        if elem.tag.endswith("spectrum"): count += 1
        elem.clear()
    print count'
    6126
    6126
    6126
    1 loops, best of 3: 950 msec per loop
    

    With a 3.8G file the timeit output is:

    1 loops, best of 3: 22.3 sec per loop
    

    Also, compare with lxml: changing xml.etree.cElementTree in the first line to lxml.etree I get:

    for the first file: 730 msec per loop

    for the second file: 11.4 sec per loop

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

When answering another question, I realized that the following program does not quite do
While answering another question, I thought of the following example: void *p; unsigned x
So, in answering another question on this site, I wrote a class for someone
In answering another question I created the following script bash script: #!/bin/bash files1=( file1.txt
As part of answering another question, I wrote the following code whose behaviour seems
This came up when answering another user's question (TheSoftwareJedi)... Given the following table: ROW_PRIORITY
In answering another question someone pointed out that in C# you can access a
Answering another question about how const string data was stored in an executable a
In answering another persons question here on SO, I discovered that there is a
Before answering, it is not as easy question as you might have thought about

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.