I have an XML file in which I want to extract data from certain tags that are ONLY nested within other tags, i.e. the tags containing the data I want to extract occur elsewhere in the XML document.
Sample XML:
<root>
<tag1>content I don't want</tag1>
<tag2>content I don't want</tag2>
<tag3>content I don't want</tag3>
<item>
<tag1>content I want</tag1>
<tag2>content I want</tag2>
<tag3>content I want</tag3>
</item>
<item>
<tag1>content I want</tag1>
<tag2>content I want</tag2>
<tag3>content I want</tag3>
</item>
</root>
Python code (which retrieves all data, including from the tags I don’t want):
for counter in range(2):
variable0 = XML_Document.getElementsByTagName('item')[counter]
variable1 = XML_Document.getElementsByTagName('tag1')[counter].toxml(encoding="utf-8")
variable2 = XML_Document.getElementsByTagName('tag2')[counter].toxml(encoding="utf-8")
variable3 = XML_Document.getElementsByTagName('tag3')[counter].toxml(encoding="utf-8")
print counter
print variable1
print variable2
print variable3
How do I modify the loop to access only the data in the tags nested in the item tags only?
You can always call
getElementsByTagName()on any subnode: