Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7757431
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T13:05:36+00:00 2026-06-01T13:05:36+00:00

My aim is to parse 25 GB of XML data. An example of such

  • 0

My aim is to parse 25 GB of XML data. An example of such a data is given below:

<Document>
<Data Id='12' category='1'  Body="abc"/>
<Data Id='13' category='1'  Body="zwq"/>
.
.
<Data Id='82018030' category='2' CorrespondingCategory1Id='13' Body="pqr"/>

However..considering the data I have of “25 GB”…my approach is quite inefficient. Please suggest some way to improve my code or an alternate approach. Also kindly include a small example code to make things clearer.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T13:05:38+00:00Added an answer on June 1, 2026 at 1:05 pm

    You might find that a SAX parser works better for this task. Rather than building a DOM, a SAX parser turns the XML file into a stream of elements and calls functions you provide to let you handle each element.

    The good thing is that SAX parsers can be very fast and memory-efficient compared to DOM parsers, and some don’t even need to be given all the XML at once, which would be ideal when you have 25 GB of it.

    Unfortunately, if you need any context information, like “I want tag <B> but only if it’s inside tag <A>,” you must maintain it yourself, since all the parser gives you is “start tag <A>, start tag <B>, end tag <B>, end tag <A>.” It never explicitly tells you that tag <B> is inside tag <A>, you have to figure that out from what you saw. And once you have seen an element, it’s gone unless you remembered it yourself.

    This gets very hairy for complex parsing jobs, but yours is probably manageable.

    It happens that Python’s standard library has a SAX parser in xml.sax. You probably want something like xml.sax.xmlreader.IncrementalParser.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a simple xml below: <?xml version=1.0 encoding=utf-8?> <catalogue> <category name=textbook id=100 parent=books>
AIM: I would like to set the below function to call every 5 seconds.
AIM : Convert a resultset to XML and assign the result to a variable
My aim is to retrieve some data from a global array which is defined
I'm attempting to parse XML in the following format (from the European Central Bank
I'm attempting to parse xml containing foreign letters (æøå specifically), however I'm having problems
I have GPS data stored as as .tcx file. This is a xml file
Aim to achieve : I want to change the source data for my pivot
Hi the aim is to parse a sizeable corpus like wikipedia to generate the
Aim I am looking to scrape 20/20 cricket scorecard data from the Cricinfo website

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.