I am parsing an XML file with the minidom parser, where I’m iterating over the XML and output specific information that stands between the tags into a dictionary.
Like this:
d={}
dom = parseString(data)
macro=dom.getElementsByTagName('macro')
for node in macro:
d={}
id_name=node.getElementsByTagName('id')[0].toxml()
id_data=id_name.replace('<id>','').replace('</id>','')
print (id_data)
cl_name=node.getElementsByTagName('cl')[1].toxml()
cl_data=cl_name.replace('<cl>','').replace('</cl>','')
print (cl_data)
d_source[id_data]=(cl_data)
Now, my problem is that the data where I’m looking for in cl_name=node.getElementsByTagName(‘cl’)[1].toxml() is sometimes non-existent!
In this case the part of the XML looks like this:
<cl>blabla</cl>
<cl></cl>
Because of this I receive an “index is out of range”-error.
However, I really need this “nothing” in my dictionary. My dictionary should look like this:
d={blabla:'',xyz:'abc'}
I have to look for the empty text node, which I tried by doing this:
if node.getElementsByTagName('cl')[1].toxml is None:
print ('')
else:
cl_name=node.getElementsByTagName('cl')[1].toxml()
cl_data=cl_name.replace('<cl>','').replace('</cl>','')
print (cl_data)
d_target[id_data]=(cl_data)
print(d_target)
I still receive that indexing error…I also thought about inserting a white space into the original source file, but am not sure if this would solve the issue. Any ideas?
If the minidom is not dictated somehow, I suggest to change your mind and use the standard xml.etree.ElementTree. It is much easier.