I am still a python beginner. As a practice project I want to code my own RSS reader.
I found a helpful tutorial here: learning python. I used the code provided in that tutorial:
#! /usr/bin/env python
import urllib2
from xml.dom import minidom, Node
""" Get the XML """
url_info = urllib2.urlopen('http://rss.slashdot.org/Slashdot/slashdot')
if (url_info):
""" We have the RSS XML lets try to parse it up """
xmldoc = minidom.parse(url_info)
if (xmldoc):
"""We have the Doc, get the root node"""
rootNode = xmldoc.documentElement
""" Iterate the child nodes """
for node in rootNode.childNodes:
""" We only care about "item" entries"""
if (node.nodeName == "item"):
""" Now iterate through all of the <item>'s children """
for item_node in node.childNodes:
if (item_node.nodeName == "title"):
""" Loop through the title Text nodes to get
the actual title"""
title = ""
for text_node in item_node.childNodes:
if (text_node.nodeType == node.TEXT_NODE):
title += text_node.nodeValue
""" Now print the title if we have one """
if (len(title)>0):
print title
if (item_node.nodeName == "description"):
""" Loop through the description Text nodes to get
the actual description"""
description = ""
for text_node in item_node.childNodes:
if (text_node.nodeType == node.TEXT_NODE):
description += text_node.nodeValue
""" Now print the title if we have one.
Add a blank with \n so that it looks better """
if (len(description)>0):
print description + "\n"
else:
print "Error getting XML document!"
else:
print "Error! Getting URL"<code>
Everything works as expected and first I thought understood it all. But as soon as I use another RSS feed (e.g. “http://www.spiegel.de/schlagzeilen/tops/index.rss” I get a “terminated” error for my application from Eclipse IDE. Can’t say more about that error message since I can’t figure out where exactly and why the app terminates. The debugger doesn’t help much since it ignores my breakpoints. Well, that’s another issue.
Anybody got an idea what I am doing wrong?
Well the “terminated” message is not an error, it’s just for information that python has exited without error.
You’re not doing anything wrong, it’s just that this RSS reader is not very flexible since it only knows one variant of RSS.
If you compare the XML-Documents of slashdot and Spiegel Online you see differences in the structure of the documents:
Slashdot:
Spiegel online:
In the feed of Spiegel Online all
<item>elements are in the<channel>-tag but in the slashdot feed they are in the root-tag (<rdf:RDF>). And your python-code expects the items only in the root-tag.If you want your rss reader to work for both feeds, you could for example change the following line:
To that:
With that all
<item>-tags are enumerated, regardless of where they are in the XML document.