Hia I am having problems parsing an rss feed from stackexchange in python. When

Question 1

Hia

I am having problems parsing an rss feed from stackexchange in python.
When I try to get the summary nodes, an empty list is return

I have been trying to solve this, but can’t get my head around.

Can anyone help out?
thanks
a

In [3o]: import lxml.etree, urllib2


In [31]: url_cooking = 'http://cooking.stackexchange.com/feeds' 

In [32]: cooking_content = urllib2.urlopen(url_cooking)

In [33]: cooking_parsed = lxml.etree.parse(cooking_content)

In [34]: cooking_texts = cooking_parsed.xpath('.//feed/entry/summary')

In [35]: cooking_texts
Out[35]: []

Question 2

Take a look at these two versions

import lxml.html, lxml.etree

url_cooking = 'http://cooking.stackexchange.com/feeds'

#lxml.etree version
data = lxml.etree.parse(url_cooking)
summary_nodes = data.xpath('.//feed/entry/summary')
print('Found ' + str(len(summary_nodes)) + ' summary nodes')

#lxml.html version
data = lxml.html.parse(url_cooking)
summary_nodes = data.xpath('.//feed/entry/summary')
print('Found ' + str(len(summary_nodes)) + ' summary nodes')

As you discovered, the second version returns no nodes, but the lxml.html version works fine. The etree version is not working because it’s expecting namespaces and the html version is working because it ignores namespaces. Part way down http://lxml.de/lxmlhtml.html, it says “The HTML parser notably ignores namespaces and some other XMLisms.”

Note when you print the root node of the etree version (print(data.getroot())), you get something like <Element {http://www.w3.org/2005/Atom}feed at 0x22d1620>. That means it’s a feed element with a namespace of http://www.w3.org/2005/Atom. Here is a corrected version of the etree code.

import lxml.html, lxml.etree

url_cooking = 'http://cooking.stackexchange.com/feeds'

ns = 'http://www.w3.org/2005/Atom'
ns_map = {'ns': ns}

data = lxml.etree.parse(url_cooking)
summary_nodes = data.xpath('//ns:feed/ns:entry/ns:summary', namespaces=ns_map)
print('Found ' + str(len(summary_nodes)) + ' summary nodes')

Editorial Team · Answer 1 · 2026-05-30T06:40:10+00:00

Take a look at these two versions

import lxml.html, lxml.etree

url_cooking = 'http://cooking.stackexchange.com/feeds'

#lxml.etree version
data = lxml.etree.parse(url_cooking)
summary_nodes = data.xpath('.//feed/entry/summary')
print('Found ' + str(len(summary_nodes)) + ' summary nodes')

#lxml.html version
data = lxml.html.parse(url_cooking)
summary_nodes = data.xpath('.//feed/entry/summary')
print('Found ' + str(len(summary_nodes)) + ' summary nodes')

As you discovered, the second version returns no nodes, but the lxml.html version works fine. The etree version is not working because it’s expecting namespaces and the html version is working because it ignores namespaces. Part way down http://lxml.de/lxmlhtml.html, it says “The HTML parser notably ignores namespaces and some other XMLisms.”

Note when you print the root node of the etree version (print(data.getroot())), you get something like <Element {http://www.w3.org/2005/Atom}feed at 0x22d1620>. That means it’s a feed element with a namespace of http://www.w3.org/2005/Atom. Here is a corrected version of the etree code.

import lxml.html, lxml.etree

url_cooking = 'http://cooking.stackexchange.com/feeds'

ns = 'http://www.w3.org/2005/Atom'
ns_map = {'ns': ns}

data = lxml.etree.parse(url_cooking)
summary_nodes = data.xpath('//ns:feed/ns:entry/ns:summary', namespaces=ns_map)
print('Found ' + str(len(summary_nodes)) + ' summary nodes')

Editorial Team
2026-05-30T06:40:10+00:00Added an answer on May 30, 2026 at 6:40 am

Take a look at these two versions

import lxml.html, lxml.etree url_cooking = 'http://cooking.stackexchange.com/feeds' #lxml.etree version data = lxml.etree.parse(url_cooking) summary_nodes = data.xpath('.//feed/entry/summary') print('Found ' + str(len(summary_nodes)) + ' summary nodes') #lxml.html version data = lxml.html.parse(url_cooking) summary_nodes = data.xpath('.//feed/entry/summary') print('Found ' + str(len(summary_nodes)) + ' summary nodes')

As you discovered, the second version returns no nodes, but the lxml.html version works fine. The etree version is not working because it’s expecting namespaces and the html version is working because it ignores namespaces. Part way down http://lxml.de/lxmlhtml.html, it says “The HTML parser notably ignores namespaces and some other XMLisms.”

Note when you print the root node of the etree version (print(data.getroot())), you get something like <Element {http://www.w3.org/2005/Atom}feed at 0x22d1620>. That means it’s a feed element with a namespace of http://www.w3.org/2005/Atom. Here is a corrected version of the etree code.

import lxml.html, lxml.etree url_cooking = 'http://cooking.stackexchange.com/feeds' ns = 'http://www.w3.org/2005/Atom' ns_map = {'ns': ns} data = lxml.etree.parse(url_cooking) summary_nodes = data.xpath('//ns:feed/ns:entry/ns:summary', namespaces=ns_map) print('Found ' + str(len(summary_nodes)) + ' summary nodes')

0

Reply

Share
Share

Share on Facebook

Share on Twitter

Share on LinkedIn

Share on WhatsApp

Report — Editorial Team, 2026-05-30T06:40:10+00:00Added an answer on May 30, 2026 at 6:40 am

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Hia I am having problems parsing an rss feed from stackexchange in python. When

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply