I am using python 2.7 with latest lxml library. I am parsing a large

Question

0

Asked: May 26, 20262026-05-26T14:59:58+00:00 2026-05-26T14:59:58+00:00

I am using python 2.7 with latest lxml library. I am parsing a large

0

I am using python 2.7 with latest lxml library. I am parsing a large XML file with very homogenous structure and millions of elements. I thought lxml’s iterparse would not build an internal tree while it parses, but apparently it does since memory usage grows until it crashes (around 1GB). Is there a way to parse large XML file using lxml without using a lot of memory?

I saw the target parser interface as one possibility, but I’m not sure if that will work any better.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T14:59:59+00:00

Try using Liza Daly’s fast_iter:

def fast_iter(context, func, args=[], kwargs={}):
    # http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
    # Author: Liza Daly
    for event, elem in context:
        func(elem, *args, **kwargs)
        elem.clear()
        while elem.getprevious() is not None:
            del elem.getparent()[0]
    del context

fast_iter removes elements from the tree after they have been parsed, and also previous elements (maybe with other tags) that are no longer needed.

It could be used like this:

import lxml.etree as ET
def process_element(elem):
    ...
context=ET.iterparse(filename, events=('end',), tag=...)        
fast_iter(context, process_element)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using python 2.7 with latest lxml library. I am parsing a large

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply