This is a pretty small question that has been almost resolved in a previous

Question

0

Asked: June 17, 20262026-06-17T22:03:49+00:00 2026-06-17T22:03:49+00:00

This is a pretty small question that has been almost resolved in a previous

0

This is a pretty small question that has been almost resolved in a previous question.

Problem is that right now i have and array of comments, but it does not quite what I need. I get an array of comments-content. And I need to get the html in-between.

Say I have something like:

<p>some html here<p>
<!-- begin mark -->
<p>Html i'm interested at.</p>
<p>More html i want to pull out of the document.</p>
<!-- end mark -->
<!-- begin mark -->
<p>This will be pulled later, but we will come to it when I get to pull the previous section.</p>
<!-- end mark -->

In a reply, they point to Crummy explanation on navigating the html tree, but I didnt find there and answer to my problem.

Any ideas? Thanks.

PS. Extra kudos if someone point me an elegant way to repeat the process a few times in a document, as I probably may get it to work, but poorly 😀

Edited to add:

With the information provided by Martijn Pieters, I got to pass the comments array obtained using the above code to the generator function he designed. So this gives no error:

for elem in comments:
    htmlcode = allnext(comments)
    print htmlcode

I think now it will be possible to manipulate the htmlcode content before iterating through the array.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T22:03:50+00:00

You can use the .next_sibling pointer to get to the next element. You can use that to find everything following a comment, up to but not including another comment:

from bs4 import Comment

def allnext(comment):
    curr = comment
    while True:
        curr = curr.next_sibling
        if isinstance(curr, Comment):
            return
        yield curr

This is a generator function, you use it to iterate over all ‘next’ elements:

for elem in allnext(comment):
    print elem

or you can use it to create a list of all next elements:

elems = list(allnext(comment))

Your example is a little too small for BeautifulSoup and it’ll wrap each comment in a <p> tag but if we use a snippet from your original target www.gamespot.com this works just fine:

<div class="ad_wrap ad_wrap_dart"><div style="text-align:center;"><img alt="Advertisement" src="http://ads.com.com/Ads/common/advertisement.gif" style="display:block;height:10px;width:120px;margin:0 auto;"/></div>
<!-- start of gamespot gpt ad tag -->
<div id="div-gpt-ad-1359295192-lb-top">
<script type="text/javascript">
        googletag.display('div-gpt-ad-1359295192-lb-top');
    </script>
<noscript>
<a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/6975/row/gamespot.com/home&amp;sz=728x90|970x66|970x150|970x250|960x150&amp;t=pos%3Dtop%26platform%3Ddesktop%26&amp;c=1359295192">
<img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/6975/row/gamespot.com/home&amp;sz=728x90|970x66|970x150|970x250|960x150&amp;t=pos%3Dtop%26platform%3Ddesktop%26&amp;c=1359295192"/>
</a>
</noscript>
</div>
<!-- end of gamespot gpt tag -->
</div>

If comment is a reference to the first comment in that snippet, the allnext() generator gives me:

>>> list(allnext(comment))
[u'\n', <div id="div-gpt-ad-1359295192-lb-top">
<script type="text/javascript">
        googletag.display('div-gpt-ad-1359295192-lb-top');
    </script>
<noscript>
<a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/6975/row/gamespot.com/home&amp;sz=728x90|970x66|970x150|970x250|960x150&amp;t=pos%3Dtop%26platform%3Ddesktop%26&amp;c=1359295192">
<img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/6975/row/gamespot.com/home&amp;sz=728x90|970x66|970x150|970x250|960x150&amp;t=pos%3Dtop%26platform%3Ddesktop%26&amp;c=1359295192"/>
</a>
</noscript>
</div>, u'\n']

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This is a pretty small question that has been almost resolved in a previous

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply