Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9072667
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T18:13:33+00:00 2026-06-16T18:13:33+00:00

I have a large xml file (about 84MB) which is in this form: <books>

  • 0

I have a large xml file (about 84MB) which is in this form:

<books>
    <book>...</book>
    ....
    <book>...</book>
</books>

My goal is to extract every single book and get its properties. I tried to parse it (as I did with other xml files) as follows:

from xml.dom.minidom import parse, parseString

fd = "myfile.xml"
parser = parse(fd)
## other python code here

but the code seems to fail in the parse instruction. Why is this happening and how can I solve this?

I should point out that the file may contain greek, spanish and arabic characters.

This is the output i got in ipython:

In [2]: fd = "myfile.xml"

In [3]: parser = parse(fd)
Killed

I would like to point out that the computer freezes during the execution, so this may be related to memory consumption as stated below.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T18:13:35+00:00Added an answer on June 16, 2026 at 6:13 pm

    I would strongly recommend using a SAX parser here. I wouldn’t recommend using minidom on any XML document larger than a few megabytes; I’ve seen it use about 400MB of RAM reading in an XML document that was about 10MB in size. I suspect the problems you are having are being caused by minidom requesting too much memory.

    Python comes with an XML SAX parser. To use it, do something like the following.

    from xml.sax.handlers import ContentHandler
    from xml.sax import parse
    
    class MyContentHandler(ContentHandler):
        # override various ContentHandler methods as needed...
    
    
    handler = MyContentHandler()
    parse("mydata.xml", handler)
    

    Your ContentHandler subclass will override various methods in ContentHandler (such as startElement, startElementNS, endElement, endElementNS or characters. These handle events generated by the SAX parser as it reads your XML document in.

    SAX is a more ‘low-level’ way to handle XML than DOM; in addition to pulling out the relevant data from the document, your ContentHandler will need to do work keeping track of what elements it is currently inside. On the upside, however, as SAX parsers don’t keep the whole document in memory, they can handle XML documents of potentially any size, including those larger than yours.

    I haven’t tried other using DOM parsers such as lxml on XML documents of this size, but I suspect that lxml will still take a considerable time and use a considerable amount of memory to parse your XML document. That could slow down your development if every time you run your code you have to wait for it to read in an 84MB XML document.

    Finally, I don’t believe the Greek, Spanish and Arabic characters you mention will cause a problem.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a large XML file which in the middle contains the following: <ArticleName>Article
I have a large xml file that looks like this: 20120124 07:30:15.301, saving to
I have a very large XML file which has like 40000 data, and when
I have a large xml file (contains about few million records ) and need
I have a large xml file (approx. 10 MB) in following simple structure: <Errors>
I have a large XML file (many MBs) that I cannot afford to download
I want to parse a large XML file and I have two options: Perl
I have a request that returns a large xml file. I have the file
I have a large XML, looking like this: <gender>M</gender> <last-name>*</last-name> <profession>2165dda2-dc59-41af-acb5-06d8914c4841</profession> <first-name>*</first-name> <mail-confirmation>1</mail-confirmation> <fax-confirmation>1</fax-confirmation>
I have a directory of very large XML files with a structure as this:

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.