<h2 class="sectionTitle">BACKGROUND</h2>
Mr. Paul J. Fribourg has bla bla</span>
<div style="margin-top:8px;">
<a href="javascript:void(0)" onclick="show_more(this);">Read Full Background</a>
</div>
I would like to extract information from Mr. Paul to blabla
Some webpage has <p> infront of Mr. Paul, so I could use FindNext('p')
However, some webpages do not have <p> like the example above..
this is my code for when there is <p>
background = bs2.find(text=re.compile("BACKGROUND"))
bb= background.findNext('p').contents
But when I don’t have <p> how I could extract information?
It’s hard to tell from the example you have given us, but it looks to me that you could just get the next node after an
h2. In this example, Lewis Carroll has ap-aragraph tag and your friend Paul has only a closingspantag:Following comments:
You may, of course, wish to check copyright notices, et cetera…