Assuming I have a sample configuration XML file that is the following:
<?xml version="1.0"?>
<note>
<to>Tove</to>
<infoaboutauthor>
<nestedprofile>
<aboutme>
<gco:CharacterString>I am a 10th grader who likes to play ball.</gco:CharacterString>
</aboutme>
</nestedprofile>
</infoaboutauthor>
<date>
<info_date>
<date>
<gco:Date>2003-06-13</gco:Date>
</date>
<datetype>
<datetype attribute="Value">
</datetype>
</datetype>
</info_date>
</date>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
In python (tried using ElementTree, not sure if its the best) I would like to get certain values for certain tags. I have tried:
with open('testfile.xml', 'rt') as f:
tree = ElementTree.parse(f)
print 'Parsing'
root = tree.getroot()
listofelements = root_elem.findall('gco:CharacterString')
for elementfound in listofelements:
print elementfound.text
In the code I use above, it appears to not work when I have the colon as I get the following error:
SyntaxError: prefix 'gco' not found in prefix map
My goal is to
- get the text in the “2003-06-13” tag
- the text in the “aboutme” tag
What is the best way to accomplish this? Is there some way to look up “gco:CharacterString” where parent is equal to “aboutme”? Or is there some convenient way to get it into a dict where I can go mydict['note']['to']['nestedprofile']['aboutme']?
Note: The “gco:” prefix is something that I have to deal with that is part of the xml. If elementtree is not appropriate for this, that is okay.
Firstly, your XML is broken. the
-in line 2 is breaking the parser. Also I don’t think it likes thegco:s. Can you possibly use some other XML configuration? Or is this automatically generated by something out of your control?So here’s what the XML needs to look like for this to work with Python:
And here’s the code to accomplish your two goals:
UPDATE
As far as dealing with the “gco:”s goes, you could do something like this:
Then before you do the above XML operations run:
Then after the XMl operations are done (of course you will need to account for the fact that the
gco:Datetag is nowstripped_Dateas is the CharacterString tag), run this:This will preserve the original format and allow you to parse it with
etree.