I have an xml file that I download from a url. I would then like to iterate through the xml to find the link to a file with a specific file extension.
My xml looks something like this:
<Foo>
<bar>
<file url="http://foo.txt"/>
<file url="http://bar.doc"/>
</bar>
</Foo>
I’ve written code to get the xml file like this:
import urllib2, re
from xml.dom.minidom import parseString
file = urllib2.urlopen('http://foobar.xml')
data = file.read()
file.close()
dom = parseString(data)
xmlTag = dom.getElementsByTagName('file')
And then I’d ‘like’ to get somthing like this to work:
i=0
url = ''
while( i < len(xmlTag)):
if re.search('*.txt', xmlTag[i].toxml() ) is not None:
url = xmlTag[i].toxml()
i = i + 1;
** Some code that parses out the url **
But that throws an error. Anyone have tips on a better approach?
Thanks!
Your last bit of code is, frankly, disgusting.
dom.getElementsByTagName('file')gives you a list of all<file>elements in the tree… just iterate over it.As an aside, you should NEVER have to do indexing manually with Python. Even in the rare instance you need the index number, just use enumerate: