I’m parsing some webpages with BeautifulSoup and trying to work within the library (instead

Question

0

Asked: May 20, 20262026-05-20T23:52:30+00:00 2026-05-20T23:52:30+00:00

I’m parsing some webpages with BeautifulSoup and trying to work within the library (instead

0

I’m parsing some webpages with BeautifulSoup and trying to work within the library (instead of trying to solve everything with a brute forced regex..)

The page I’m looking at is structured like this:

<!--comment--> 
<div>a</div>
<div>b</div>
<div>c</div>
<!--comment--> 
<div>a</div>
<div>b</div
<div>c</div

I want to parse each section individually. Is there a way to tell beautifulsoup to break down the area between identical comments?

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T23:52:31+00:00

Comments are nodes, like anything else:

from BeautifulSoup import BeautifulSoup
from BeautifulSoup import Comment
from BeautifulSoup import NavigableString

text = BeautifulSoup("""<!--comment--><div>a</div><div>b</div><div>c</div>
                        <!--comment--><div>a</div><div>b</div><div>c</div>""")

comments = text.findAll(text=lambda elm: isinstance(elm, Comment))
for comment in comments:
    next_sib = comment.nextSibling
    while not isinstance(next_sib, Comment) and \
        not isinstance(next_sib, NavigableString) and next_sib:
        # This prints each sibling while it isn't whitespace or another comment
        # Append next_sib to a list, dictionary, etc, etc and
        # do what you want with it
        print next_sib 
        next_sib = next_sib.nextSibling

EDIT:

It doesn’t detect identical comments (comment text?) but you can solve that by checking if the comment text is identical to the previous comment block.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m parsing some webpages with BeautifulSoup and trying to work within the library (instead

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply