I’m using BeautifulSoup to do some screen-scraping. My problem is this: I need to

Question

0

Asked: May 15, 20262026-05-15T17:05:12+00:00 2026-05-15T17:05:12+00:00

I’m using BeautifulSoup to do some screen-scraping. My problem is this: I need to

0

I’m using BeautifulSoup to do some screen-scraping. My problem is this:
I need to extract specific things out of a paragraph. An example:

<p><b><a href="/name/abe">ABE</a></b> &nbsp; <font class="masc">m</font> &nbsp; <font class="info"><a href="/nmc/eng.php" class="usg">English</a>, <a href="/nmc/jew.php" class="usg">Hebrew</a></font><br />Short form of <a href="/name/abraham" class="nl">ABRAHAM</a>

Out of this paragraph, I’m able to extract the name ABE as follows:

for pFound in soup.findAll('p'):

    print pFound


#will get the names
    x = pFound.find('a').renderContents()
    print x

Now my problem is to extract the other name as well, in the same paragraph.

Short form of <a href="/name/abraham" class="nl">ABRAHAM</a>

I need to extract this only if the tag a is preceded by the text “Short form of”

Any ideas on how to do this?
There are many such paragraphs in the HTML page, and not all of them have the text “Short form of” They might contain some other text in that place.

I think that some combination of regex and findNext() may be useful, but i’m not familiar with BeautifulSoup. Ended up wasting quite a lot of time.

Any help would be appreciated.
Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T17:05:12+00:00

The following should work…:

htm = '''<p><b><a href="/name/abe">ABE</a></b> &nbsp; <font class="masc">m
</font>&nbsp; <font class="info"><a href="/nmc/eng.php" class="usg">English
</a>, <a href="/nmc/jew.php" class="usg">Hebrew</a></font><br />
Short form of <a href="/name/abraham" class="nl">ABRAHAM</a>'''

import BeautifulSoup

soup = BeautifulSoup.BeautifulSoup(htm)

for p in soup.findAll('p'):
  firsta = True
  shortf = False
  for c in p.recursiveChildGenerator():
    if isinstance(c, BeautifulSoup.NavigableString):
      if 'Short form of' in str(c):
        shortf = True
    elif c.name == 'a':
      if firsta or shortf:
        print c.renderContents()
        firsta = shortf = False

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using BeautifulSoup to do some screen-scraping. My problem is this: I need to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply