I’m using Python with ElementTree to parse an XML file. I want to be able to make a list of dictionaries containing the information of all the CDs. I can use this list later to gather information, like displaying the title of CDs coming from the USA. The code below is working, but can easily be broken if the YEAR tag is not the last tag of CD. How can I rewrite this code so that tags could be in any order?
from xml.etree.ElementTree import ElementTree
f = open("cd_catalog.xml")
tree = ElementTree()
tree.parse(f)
catalog = []
cd = {}
for node in tree.iter():
if node.tag != "CD" and node.tag != "CATALOG":
tagtext = (node.tag,node.text),
cd.update(tagtext)
if node.tag == "YEAR":
catalog.append(cd)
cd = {}
for cd in catalog:
if cd["COUNTRY"] == "USA":
print("The cd named {0} is from USA".format(cd["TITLE"]))
2 entries of the xml file :
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD>
<TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>UK</COUNTRY>
<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR>
</CD>
</CATALOG>
One way to rewrite your XML parsing code is the following. In this this I define a generator which loops over all the
CDelements of the root element (I do not check that this is aCATALOGelement, although you could add that check in). This generator returns all of the sub-elements of eachCDelement as a dictionary.The use of a generator is more efficient than building a dictionary of all the
CDelements, particularly if your XML file is very large, since you only ever store a singleCDelement in memory.Here is the above method in action: