Logical flow of the scraper: article links extracted from an XML feed are put

Question

0

Asked: June 6, 20262026-06-06T14:00:21+00:00 2026-06-06T14:00:21+00:00

Logical flow of the scraper: article links extracted from an XML feed are put

0

Logical flow of the scraper: article links extracted from an XML feed are put into a list called self.raw_html. The following [simplified] method is then called to filter out the container the articles are in and remove text from the given articles:

def fetch_article_contents(self):
    for article in self.raw_html:
        self.css_selector_type == 'class':
        soup = article.find(self.html_element,
                            self.css_selector)
        soup = soup.get_text()
        self.article_html.append(soup)
    return self.article_html

This works well on most feeds, but on two notable exemptions (Forbes and Official Google Blog) fails with the following message when get_text() is called:

AttributeError: 'NoneType' object has no attribute 'get_text'

My first logical step in debugging was to see what was returning a NoneType object, so I stuck a print type(soup) right before soup = soup.get_text(). I found:

<class 'bs4.element.Tag'> (25 times, condensed to save space)
<type 'NoneType'>

This also strikes me as strange because there are currently 29 articles in self.raw_html when fetching the Forbes XML feed as verified by len(self.raw_html) when the class is initalized.

The Google Official Blog returns:

<class 'bs4.element.Tag'> (just once this time)
<type 'NoneType'>

and in reality has 25 fetched articles.

What is the problem I’m encountering? Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T14:00:23+00:00

Editorial Team

2026-06-06T14:00:23+00:00Added an answer on June 6, 2026 at 2:00 pm

You don’t show what self.html_element and self.css_selector are, but it seems clear the the article.find method is not finding them, and returning None.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Logical flow of the scraper: article links extracted from an XML feed are put

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply