I found the answer after searchting the web as well…

Question

0

Asked: May 11, 20262026-05-11T21:19:28+00:00 2026-05-11T21:19:28+00:00

Im using Python’s built in XML parser to load a 1.5 gig XML file

0

Im using Python’s built in XML parser to load a 1.5 gig XML file and it takes all day.

from xml.dom import minidom
xmldoc = minidom.parse('events.xml')

I need to know how to get inside that and measure its progress so I can show a progress bar.
any ideas?

minidom has another method called parseString() that returns a DOM tree assuming the string you pass it is valid XML, If I were to split up the file myself into chunks and pass them to parseString one at a time, could I possibly merge all the DOM trees back together at the end?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-11T21:19:28+00:00

you usecase requires that you use sax parser instead of dom, dom loads everything in memory , sax instead will do line by line parsing and you write handlers for events as you need
so could be effective and you would be able to write progress indicator also

I also recommend trying expat parser sometime it is very useful
http://docs.python.org/library/pyexpat.html

for progress using sax:

as sax reads file incrementally you can wrap the file object you pass with your own and keep track how much have been read.

edit:
I also don’t like idea of splitting file yourselves and joining DOM at end, that way you are better writing your own xml parser, i recommend instead using sax parser
I also wonder what your purpose of reading 1.5 gig file in DOM tree?
look like sax would be better here

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions