I have XML files matching a DTD
<!ELEMENT root (node, notinteresting>
<!ELEMENT node (node*)>
<!ELEMENT notinteresting (#PCDATA)>
and I want to retrieve the topmost node (in XPath: /root/node) of such a file and everything below it, ignoring the notinteresting bit. How can I do this in a few lines of Python? Speed/memory consumption aren’t an issue. I want something out that I can print.
You can use elementtree API, depending on the version you will use the import might be slightly different. You need version >= python 2.7
Then it gives you the possibility to do things like.
note that if you have only a string for your input, instead of parse you can use fromstring()
update: You can also use, if “root” is the root element of xml file