i have the following function, which doe a basic job of mapping an lxml object to a dictionary…
from lxml import etree
tree = etree.parse('file.xml')
root = tree.getroot()
def xml_to_dict(el):
d={}
if el.text:
print '***write tag as string'
d[el.tag] = el.text
else:
d[el.tag] = {}
children = el.getchildren()
if children:
d[el.tag] = map(xml_to_dict, children)
return d
v = xml_to_dict(root)
at the moment it gives me….
>>>print v
{'root': [{'a': '1'}, {'a': [{'b': '2'}, {'b': '2'}]}, {'aa': '1a'}]}
but i would like….
>>>print v
{'root': {'a': ['1', {'b': [2, 2]}], 'aa': '1a'}}
how do i rewrite the function xml_to_dict(el) so that i get the required output?
here’s the xml i’m parsing, for clarity.
<root>
<a>1</a>
<a>
<b>2</b>
<b>2</b>
</a>
<aa>1a</aa>
</root>
thanks 🙂
Well,
map()will always return a list, so the easy answer is “don’t usemap()“. Instead, build a dictionary like you already are, by looping overchildrenand assigning the result ofxml_to_dict(child)to the dictionary key you want to use. It looks like you want to use the tag as the key and have the value be a list of items with that tag, so it would become something like:This leaves the tag entry in the dict as a defaultdict; if you want a normal dict for some reason, use
d[el.tag] = dict(child_dicts). Note that, like before, if a tag has both text and children the text won’t appear in the dict. You may want to think about a different layout for your dict to cope with that.EDIT:
Code that would produce the output in your rephrased question wouldn’t recurse in
xml_to_dict— because you only want a dict for the outer element, not for all child tags. So, you’d use something like:This still doesn’t handle tags with both text and children sanely, and it turns the
collections.defaultdict(list)into a normal dict so the output is (almost) as you expect:(If you really want integers instead of strings for the text data in the
btags, you’ll have to explicitly turn them into integers somehow.)