import os import xml.etree.ElementTree as et for ev, el in et.iterparse(os.sys.stdin): el.clear() Running the

Question

0

Asked: June 1, 20262026-06-01T12:39:28+00:00 2026-06-01T12:39:28+00:00

import os import xml.etree.ElementTree as et for ev, el in et.iterparse(os.sys.stdin): el.clear() Running the

0

import os
import xml.etree.ElementTree as et

for ev, el in et.iterparse(os.sys.stdin):
    el.clear()

Running the above on the ODP structure RDF dump results in always increasing memory. Why is that? I understand ElementTree still builds a parse tree, albeit with the child nodes clear()ed. If that is the cause of this memory usage pattern, is there a way around it?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T12:39:29+00:00

You are clearing each element but references to them remain in the root document. So the individual elements still cannot be garbage collected.

The solution is to clear references in the root, like so:

import xml.etree.ElementTree as ET

# get iterator
context = ET.iterparse(source, events=("start", "end"))

# get the root element
event, root = next(context)

for event, elem in context:
    if event == "end" and elem.tag == "record":
        # process record elements here...
        root.clear()

Another thing to remember about memory usage, which may not be affecting your situation, is that once the VM allocates memory for heap storage from the system, it generally never gives that memory back. Most Java VMs work this way too. So you should not expect the size of the interpreter in top or ps to ever decrease, even if that heap memory is unused.

update :

Code changed in order to work in Python 3+.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

import os import xml.etree.ElementTree as et for ev, el in et.iterparse(os.sys.stdin): el.clear() Running the

Leave an answerCancel reply

1 Answer

update :

Leave an answer
Cancel reply