I am trying to parse an XML file using Python. Due to the size

Question

0

Asked: June 14, 20262026-06-14T17:30:12+00:00 2026-06-14T17:30:12+00:00

I am trying to parse an XML file using Python. Due to the size

0

I am trying to parse an XML file using Python. Due to the size of the XML, I want to use a Pull Parser. I found this one.

My code starts with

doc = pulldom.parse("myfile.xml")
for event, node in doc:
    # code here...

I am using

if (node.localName == "b"):

to get the XML tag name, and it works fine.

What I can’t find how to do is get the text from between the tags. Using node.nodeValue returns None.

I can use node.toxml() to get the full XML for the node, but I only want the text between the tags. Is there a way to do this other than using a regex replace to take the tags out of node.toxml()?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T17:30:13+00:00

You have two nodes with local name “b” for every tag with text – a START_ELEMENT and an END_ELEMENT. Normally you should receive something like this:

START_ELEMENT
CHARACTERS
END_ELEMENT

So you are looking for the characters after a matching start-element. You may want to try something like this:

from xml.dom.pulldom import CHARACTERS, START_ELEMENT, parse

doc = parse("myfile.xml")
text_expected = False
for event, node in doc:
    print event, node
    if text_expected:
        text_expected = False
        if event != CHARACTERS:
            # strange .. there should be some
            continue
        print node.data
    else:
        text_expected = (event == START_ELEMENT) and (node.localName == "b")

With this myfile.xml

<a>
    <b>c1</b>
    <b>c2</b>
</a>

I get the output

c1
c2

Note that you might need to strip() each string and you must ignore every other CHARACTERS-event. Every linebreak and whitespace between two elements generate a CHARACTERS-event.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse an XML file using Python. Due to the size

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply