I am using Beautiful Soup to pull out specific div tags, and it seems

Question

0

Editorial Team

Asked: June 15, 20262026-06-15T18:38:35+00:00 2026-06-15T18:38:35+00:00

I am using Beautiful Soup to pull out specific div tags, and it seems

0

I am using Beautiful Soup to pull out specific div tags, and it seems I can’t use
simple string matching.

The page has some tags in the form of

<div class="comment form new"...>

which I want to ignore, and also some tags in the form of

<div class="comment comment-xxxx...">

where the x’s represent an integer of arbitrary length, and the ellipses represents an arbitrary number of other values separated by white spaces (that I’m not concerned about). I can’t figure out the
correct regex expression, especially since I’ve never used python’s re class.

Using

soup.find_all(class_="comment")

finds all tags starting with the word comment. I have tried using

soup.find_all(class_=re.compile(r'(comment)( )(comment)'))
soup.find_all(class_=re.compile(r'comment comment.*'))

and lots of other variations, but I think I’m missing something obvious here about how regex expressions or match() work. Can anyone help me out?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T18:38:37+00:00

I think I’ve got it:

>>> [div['class'] for div in soup.find_all('div')]
[['comment', 'form', 'new'], ['comment', 'comment-xxxx...']]

Notice that, unlike the equivalent in BS3, it’s not this:

['comment form new', 'comment comment-xxxx...']

And that’s why your regexps won’t match.

But you can match, e.g., this:

>>> soup.find_all('div', class_=re.compile('comment-'))
[<div class="comment comment-xxxx..."></div>]

Note that BS does the equivalent of re.search, not re.match, so you don’t need 'comment-.*'. Of course if you want to match 'comment-12345' but not 'comment-of-another-kind you’d want, e.g., 'comment-\d+'.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using Beautiful Soup to pull out specific div tags, and it seems

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply