<h2 class=sectionTitle>BACKGROUND</h2> Mr. Paul J. Fribourg has bla bla</span> <div style=margin-top:8px;> <a href=javascript:void(0) onclick=show_more(this);>Read

Question

0

Asked: May 25, 20262026-05-25T00:05:12+00:00 2026-05-25T00:05:12+00:00

<h2 class=sectionTitle>BACKGROUND</h2> Mr. Paul J. Fribourg has bla bla</span> <div style=margin-top:8px;> <a href=javascript:void(0) onclick=show_more(this);>Read

0

<h2 class="sectionTitle">BACKGROUND</h2>
Mr. Paul J. Fribourg has bla bla</span>
<div style="margin-top:8px;">
    <a href="javascript:void(0)" onclick="show_more(this);">Read Full Background</a>
</div>

I would like to extract information from Mr. Paul to blabla
Some webpage has <p> infront of Mr. Paul, so I could use FindNext('p')
However, some webpages do not have <p> like the example above..

this is my code for when there is <p>

background = bs2.find(text=re.compile("BACKGROUND"))
bb= background.findNext('p').contents

But when I don’t have <p> how I could extract information?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T00:05:13+00:00

It’s hard to tell from the example you have given us, but it looks to me that you could just get the next node after an h2. In this example, Lewis Carroll has a p-aragraph tag and your friend Paul has only a closing span tag:

>>> from BeautifulSoup import BeautifulSoup
>>>
>>> html = '''
... <h2 class="sectionTitle">BACKGROUND</h2>
... <p>Mr. Lewis Carroll has bla bla</p>
... <div style="margin-top:8px;">
...     <a href="javascript:void(0)" onclick="show_more(this);">Read Full Background</a>
... </div>
... <h2 class="sectionTitle">BACKGROUND</h2>
... Mr. Paul J. Fribourg has bla bla</span>
... <div style="margin-top:8px;">
...     <a href="javascript:void(0)" onclick="show_more(this);">Read Full Background</a>
... </div>
... '''
>>>
>>> soup = BeautifulSoup(html)
>>> headings = soup.findAll('h2', text='BACKGROUND')
>>> for section in headings:
...     p = section.findNext('p')
...     if p:
...         print '> ',  p.string
...     else:
...         print '> ', section.parent.next.next.strip()
...
>  Mr. Lewis Carroll has bla bla
>  Mr. Paul J. Fribourg has bla bla

Following comments:

>>> from BeautifulSoup import BeautifulSoup
>>> from urllib2 import urlopen
>>> html = urlopen('http://investing.businessweek.com/research/stocks/private/person.asp?personId=668561&privcapId=160900&previousCapId=285930&previousTitle=LOEWS%20CORP')
>>> soup = BeautifulSoup(html)
>>> headings = soup.findAll('h2', text='BACKGROUND')
>>> for section in headings:
...     paragraph = section.findNext('p')
...     if paragraph and paragraph.string:
...         print '> ', paragraph.string
...     else:
...         print '> ', section.parent.next.next.strip()
... 
>  Mr. Paul J. Fribourg has been the President of Contigroup Companies Inc. (for [...]

You may, of course, wish to check copyright notices, et cetera…

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

<h2 class=sectionTitle>BACKGROUND</h2> Mr. Paul J. Fribourg has bla bla</span> <div style=margin-top:8px;> <a href=javascript:void(0) onclick=show_more(this);>Read

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply