This is a pretty small question that has been almost resolved in a previous question.
Problem is that right now i have and array of comments, but it does not quite what I need. I get an array of comments-content. And I need to get the html in-between.
Say I have something like:
<p>some html here<p>
<!-- begin mark -->
<p>Html i'm interested at.</p>
<p>More html i want to pull out of the document.</p>
<!-- end mark -->
<!-- begin mark -->
<p>This will be pulled later, but we will come to it when I get to pull the previous section.</p>
<!-- end mark -->
In a reply, they point to Crummy explanation on navigating the html tree, but I didnt find there and answer to my problem.
Any ideas? Thanks.
PS. Extra kudos if someone point me an elegant way to repeat the process a few times in a document, as I probably may get it to work, but poorly 😀
Edited to add:
With the information provided by Martijn Pieters, I got to pass the comments array obtained using the above code to the generator function he designed. So this gives no error:
for elem in comments:
htmlcode = allnext(comments)
print htmlcode
I think now it will be possible to manipulate the htmlcode content before iterating through the array.
You can use the
.next_siblingpointer to get to the next element. You can use that to find everything following a comment, up to but not including another comment:This is a generator function, you use it to iterate over all ‘next’ elements:
or you can use it to create a list of all next elements:
Your example is a little too small for BeautifulSoup and it’ll wrap each comment in a
<p>tag but if we use a snippet from your original targetwww.gamespot.comthis works just fine:If
commentis a reference to the first comment in that snippet, theallnext()generator gives me: