I have the following function which does a crude job of parsing an XML file into a dictionary.
Unfortunately, since Python dictionaries are not ordered, I am unable to cycle through the nodes as I would like.
How do I change this so it outputs an ordered dictionary which reflects the original order of the nodes when looped with for.
def simplexml_load_file(file):
import collections
from lxml import etree
tree = etree.parse(file)
root = tree.getroot()
def xml_to_item(el):
item = None
if el.text:
item = el.text
child_dicts = collections.defaultdict(list)
for child in el.getchildren():
child_dicts[child.tag].append(xml_to_item(child))
return dict(child_dicts) or item
def xml_to_dict(el):
return {el.tag: xml_to_item(el)}
return xml_to_dict(root)
x = simplexml_load_file('routines/test.xml')
print x
for y in x['root']:
print y
Outputs:
{'root': {
'a': ['1'],
'aa': [{'b': [{'c': ['2']}, '2']}],
'aaaa': [{'bb': ['4']}],
'aaa': ['3'],
'aaaaa': ['5']
}}
a
aa
aaaa
aaa
aaaaa
How can I implement collections.OrderedDict so that I can be sure of getting the correct order of the nodes?
XML file for reference:
<root>
<a>1</a>
<aa>
<b>
<c>2</c>
</b>
<b>2</b>
</aa>
<aaa>3</aaa>
<aaaa>
<bb>4</bb>
</aaaa>
<aaaaa>5</aaaaa>
</root>
You could use the new
OrderedDictdictsubclass which was added to the standard library’scollectionsmodule in version 2.7✶. Actually what you need is anOrdered+defaultdictcombination which doesn’t exist — but it’s possible to create one by subclassingOrderedDictas illustrated below:✶ If your version of Python doesn’t have
OrderedDict, you should be able use Raymond Hettinger’s Ordered Dictionary for Py2.4 ActiveState recipe as the base class instead.The output produced from your test XML file looks like this:
Which I think is close to what you want.
Minor update:
Added a
__reduce__()method which will allow the instances of the class to be pickled and unpickled properly. This wasn’t necessary for this question, but came up in a similar one.