given this xml file, i would like to extract the data out from it. However, i have trouble extracting the data from <LandmarkPointListXml> onwards.
The XML file:
<?xml version="1.0" encoding="utf-8"?>
<Map xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<MapName>er</MapName>
<MapURL>er.gif</MapURL>
<Name>er</Name>
<URL>er.gif</URL>
<LandmarkPointListXml>
<anyType xsi:type="LandmarkPointProperty">
<LandmarkPointX>400</LandmarkPointX>
<LandmarkPointY>292</LandmarkPointY>
<LandmarkDesc>my room door</LandmarkDesc>
</anyType>
<anyType xsi:type="LandmarkPointProperty">
<LandmarkPointX>399</LandmarkPointX>
<LandmarkPointY>219</LandmarkPointY>
<LandmarkDesc>bro room door</LandmarkDesc>
</anyType>
</LandmarkPointListXml>
<RegionPointListXml />
</Map>
Python program:
def GetMapData(self):
result = ""
haha = self.XMLdoc.firstChild #root node
for child in haha.childNodes:
if (cmp(child.nodeName,'LandmarkPointListXml')==0):
result = result + '|' + self.loopLandmark(child.childNodes) + '|'
else:
result = result + child.firstChild.nodeValue + ','
return result
def loopLandmark(self, landmarks):
result=""
haha=landmarks.getElementsByTagName('anyType')
for child in haha.childNodes:
if (cmp(haha.firstChild.nodeName,'LandmarkPointX') == 0):
result=result+child.firstChild.nodeValue+','
ChildNode = ChildNode.nextSibling
result=result+child.firstChild.nodeValue+','
ChildNode = ChildNode.nextSibling
result=result+child.firstChild.nodeValue
return result
I was able to retrieve the result, “er,er.gif,er,er.gif,” till the program reaches <LandmarkPointListXml>.
This code is quite fragile. It makes strong assumptions on the XML input, and would fail if the XML was modified in a valid way (e.g. if is not immediately after ).
I suggest using a standard library when parsing XML, such as Element Tree ( http://docs.python.org/library/xml.etree.elementtree.html ) or lxml ( http://lxml.de ), which can also validate your XML input.
The code I’m writing below uses Element Tree and works on your XML input (I have removed the ‘self’ arguments to the parent class). It also tolerates (ignores) empty values in XML elements.