I’m parsing some webpages with BeautifulSoup and trying to work within the library (instead of trying to solve everything with a brute forced regex..)
The page I’m looking at is structured like this:
<!--comment-->
<div>a</div>
<div>b</div>
<div>c</div>
<!--comment-->
<div>a</div>
<div>b</div
<div>c</div
I want to parse each section individually. Is there a way to tell beautifulsoup to break down the area between identical comments?
Thanks!
Comments are nodes, like anything else:
EDIT:
It doesn’t detect identical comments (comment text?) but you can solve that by checking if the comment text is identical to the previous comment block.