I’m trying to scrape all the inner html from the <p> elements in a

Question

0

Asked: May 15, 20262026-05-15T04:41:47+00:00 2026-05-15T04:41:47+00:00

I’m trying to scrape all the inner html from the <p> elements in a

0

I’m trying to scrape all the inner html from the <p> elements in a web page using BeautifulSoup. There are internal tags, but I don’t care, I just want to get the internal text.

For example, for:

<p>Red</p>
<p><i>Blue</i></p>
<p>Yellow</p>
<p>Light <b>green</b></p>

How can I extract:

Red
Blue
Yellow
Light green

Neither .string nor .contents[0] does what I need. Nor does .extract(), because I don’t want to have to specify the internal tags in advance – I want to deal with any that may occur.

Is there a ‘just get the visible HTML’ type of method in BeautifulSoup?

—-UPDATE——

On advice, trying:

soup = BeautifulSoup(open("test.html"))
p_tags = soup.findAll('p',text=True)
for i, p_tag in enumerate(p_tags): 
    print str(i) + p_tag

But that doesn’t help – it prints out:

0Red
1

2Blue
3

4Yellow
5

6Light 
7green
8

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T04:41:48+00:00

Short answer: soup.findAll(text=True)

This has already been answered, here on StackOverflow and in the BeautifulSoup documentation.

UPDATE:

To clarify, a working piece of code:

>>> txt = """\
... <p>Red</p>
... <p><i>Blue</i></p>
... <p>Yellow</p>
... <p>Light <b>green</b></p>
... """
>>> import BeautifulSoup
>>> BeautifulSoup.__version__
'3.0.7a'
>>> soup = BeautifulSoup.BeautifulSoup(txt)
>>> for node in soup.findAll('p'):
...     print ''.join(node.findAll(text=True))

Red
Blue
Yellow
Light green

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to scrape all the inner html from the <p> elements in a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply