I’m trying to convert an xml document to Python data structures.
A sample of the xml:
<SOFTWARES>
<PUBLISHER>Microsoft Corporation</PUBLISHER>
<NAME>Microsoft Office Visio 2010</NAME>
<VERSION>14.0.6029.1000</VERSION>
<FOLDER>C:/Program Files/Microsoft Office/</FOLDER>
<LANGUAGE>Language Neutral</LANGUAGE>
<INSTALLDATE>2012/03/29</INSTALLDATE>
</SOFTWARES>
<SOFTWARES>
<PUBLISHER>Microsoft</PUBLISHER>
<NAME>Update for Microsoft Office 2010 (KB2553310) 64-Bit Edition</NAME>
<INSTALLDATE>0000//0/0/00</INSTALLDATE>
</SOFTWARES>
lxml.de has an excellent example of this: http://lxml.de/FAQ.html#how-can-i-map-an-xml-tree-into-a-dict-of-dicts
def xml_to_dict(element):
return element.tag, dict(map(xml_to_dict, element)) or element.text
This produces a great dict of dicts which has but one flaw. It will override existing keys. So when the the process is complete I get:
'SOFTWARES': {
'PUBLISHER': 'Microsoft',
'NAME': 'Update for Microsoft Office 2010 (KB2553310) 64-Bit Edition',
'INSTALLDATE': '0000//0/0/00',
},
Which is the last SOFTWARES block, regardless of how many were before it. lxml’s function works well because it’s recursive but I want to write something that can handle duplicate keys. Preferably by just tossing the SOFTWARES dicts in a list and I can just iterate through the list when the time comes.
Easiest solution for this specific case:
This will give you a list of dictionaries.