I am working on a project requiring me to scan through a large number

Question

0

Asked: June 1, 20262026-06-01T14:56:27+00:00 2026-06-01T14:56:27+00:00

I am working on a project requiring me to scan through a large number

0

I am working on a project requiring me to scan through a large number of HTML-files (8000+). Some of these files are broken but this is an inevitable consequence of the source of the files, and cannot be fixed.

I have chosen to use BeautifulSoup4 to find and extract the data. The code for this is the following:

from bs4 import BeautifulSoup

data = open('data\file.html', encoding='utf-8')
soup = BeautifulSoup(data)

tag = soup.find('strong', text="Heading:")

split_tag = str(tag.next_sibling.next_element.next_element).split(", ")

What it does is that it opens a file, searches for a strong-tag containing the text “Heading:”. Then it splits the contents of this tag at the commas.

However, if the source file is broken it does not have a strong-tag containing the text “Heading:”. Therefore, an AttributeError is raised at split_tag due to the fact that it returns “None” and therefore does not have any next_sibling.

I tried to fix this by using the following method:

try:
    split_tag = str(tag.next_sibling.next_element.next_element).split(", ")
except AttributeError:
    pass
else:
    split_tag = str(tag.next_sibling.next_element.next_element).split(", ")

This did not work. I also tried expressing this as a function but no luck.

So I turn to you. What I want to do is to split the contents at the commas if there are any contents. If not, the script should just pass.

I am very grateful for any assistance!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T14:56:28+00:00

Here is how I would rewrite your exception handler. If there is no such heading, then we should expect to get a zero-length list of tags.

from bs4 import BeautifulSoup

data = open('data\file.html', encoding='utf-8')
soup = BeautifulSoup(data)

tag = soup.find('strong', text="Heading:")

try:
    split_tag = str(tag.next_sibling.next_element.next_element).split(", ")
except AttributeError:
    split_tag = []  # zero-length list of tags

But in this case a simple if statement should work well, because soup.find() is returning None when nothing is found.

from bs4 import BeautifulSoup

data = open('data\file.html', encoding='utf-8')
soup = BeautifulSoup(data)

tag = soup.find('strong', text="Heading:")

if tag is None:
    split_tag = []
else:
    split_tag = str(tag.next_sibling.next_element.next_element).split(", ")

When checking for None, it is best to use the is test for object identity, as I showed above.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am working on a project requiring me to scan through a large number

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply