I’m trying to parse an XML with Beautifulsoup, but hit a brick wall when

Question

0

Asked: May 19, 20262026-05-19T02:11:29+00:00 2026-05-19T02:11:29+00:00

I’m trying to parse an XML with Beautifulsoup, but hit a brick wall when

0

I’m trying to parse an XML with Beautifulsoup, but hit a brick wall when trying to use the “recursive” attribute with findall()

I have a pretty odd xml format shown below:

<?xml version="1.0"?>
<catalog>
   <book>
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
      <book>true</book>
   </book>
   <book>
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
      <book>false</book>
   </book>
 </catalog>

As you can see, the book tag repeats inside the book tag, which causes an error when I try to to something like:

from BeautifulSoup import BeautifulStoneSoup as BSS

catalog = "catalog.xml"


def open_rss():
    f = open(catalog, 'r')
    return f.read()

def rss_parser():
    rss_contents = open_rss()
    soup = BSS(rss_contents)
    items = soup.findAll('book', recursive=False)

    for item in items:
        print item.title.string

rss_parser()

As you will see, on my soup.findAll I’ve added recursive=false, which in theory would make it no recurse through the item found, but skip to the next one.

This doesn’t seem to work, as I always get the following error:

  File "catalog.py", line 17, in rss_parser
    print item.title.string
AttributeError: 'NoneType' object has no attribute 'string'

I’m sure I’m doing something stupid here, and would appreciate if someone could give me some help on how to solve this problem.

Changing the HTML structure is not an option, this this code needs to perform well as it will potentially parse a large XML file.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T02:11:30+00:00

soup.findAll('catalog', recursive=False) will return a list containing only your top-level “catalog” tag. Since that doesn’t have a “title” child, item.title is None.

Try soup.findAll("book") or soup.find("catalog").findChildren() instead.

Edit: OK, the problem wasn’t what I thought it was. Try this:

BSS.NESTABLE_TAGS["book"] = []
soup = BSS(open("catalog.xml"))
soup.catalog.findChildren(recursive=False)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to parse an XML with Beautifulsoup, but hit a brick wall when

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply