I’m building a simple web-based RSS reader in Python, but I’m having trouble parsing the XML. I started out by trying some stuff in the Python command line.
>>> from xml.dom import minidom >>> import urllib2 >>> url ='http://www.digg.com/rss/index.xml' >>> xmldoc = minidom.parse(urllib2.urlopen(url)) >>> channelnode = xmldoc.getElementsByTagName('channel') >>> channelnode = xmldoc.getElementsByTagName('channel') >>> titlenode = channelnode[0].getElementsByTagName('title') >>> print titlenode[0] <DOM Element: title at 0xb37440> >>> print titlenode[0].nodeValue None
I played around with this for a while, but the nodeValue of everything seems to be None. Yet if you look at the XML, there definitely are values there. What am I doing wrong?
For RSS feeds you should try the Universal Feed Parser library. It simplifies the handling of RSS feeds immensly.