I’m going through Asheesh Laroia’s “Scrape the Web” presentation from PyCon 2010 and I have a question about a particular line of code which is this line:
title_element = parsed.getElementsByTagName('title')[0]
from the function:
def main(filename):
#Parse the file
parsed = xml.dom.minidom.parse(open(filename))
# Get title element
title_element = parsed.getElementsByTagName('title')[0]
# Print just the text underneath it
print title_element.firstChild.wholeText
I don’t know what role ‘[0]’ is performing at the end of that line. Does ‘xml.dom.minidom.parse’ parse the input into a list?
parse()does not return a list;getElementsByTagName()does. You’re asking for all elements with a tag of<title>. Most tags can appear multiple times in a document, so when you ask for those elements, you’ll get more than one. The obvious way to return them is as a list or tuple.In this case you expect only one
<title>tag in the document, so you just take the first element in the list.