I need to find all the visible tags inside paragraph elements in an HTML

Question

0

Asked: June 1, 20262026-06-01T20:55:06+00:00 2026-06-01T20:55:06+00:00

I need to find all the visible tags inside paragraph elements in an HTML

0

I need to find all the visible tags inside paragraph elements in an HTML file using BeautifulSoup in Python.
For example,
<p>Many hundreds of named mango <a href="/wiki/Cultivar" title="Cultivar">cultivars</a> exist.</p>
should return:
Many hundreds of cultivars exist.

P.S. Some files contain Unicode characters (Hindi) which need to be extracted.
Any ideas how to do that?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T20:55:08+00:00

Editorial Team

2026-06-01T20:55:08+00:00Added an answer on June 1, 2026 at 8:55 pm

Here’s how you can do it with BeautifulSoup. This will remove any tags not in VALID_TAGS but keep the content of the removed tags.

from BeautifulSoup import BeautifulSoup

VALID_TAGS = ['div', 'p']

soup = BeautifulSoup(value)

for tag in soup.findAll('p'):
    if tag.name not in VALID_TAGS:
        tag.replaceWith(tag.renderContents())

print soup.renderContents()

Reference

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to find all the visible tags inside paragraph elements in an HTML

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply