page=nltk.clean_html(soup.findAll(‘div’,id=bodyContent)) When I try to run this code,It shows : Traceback (most recent call

Question

0

Asked: June 17, 20262026-06-17T23:31:31+00:00 2026-06-17T23:31:31+00:00

page=nltk.clean_html(soup.findAll(‘div’,id=bodyContent)) When I try to run this code,It shows : Traceback (most recent call

0

page=nltk.clean_html(soup.findAll('div',id="bodyContent"))

When I try to run this code,It shows :

Traceback (most recent call last):
  File "C:\Python27\wiki3.py", line 36, in <module>
    page=nltk.clean_html(soup.findAll('div',id="bodyContent"))
  File "C:\Python27\lib\site-packages\nltk-2.0.4-py2.7.egg\nltk\util.py", line 340, in clean_html
    cleaned = re.sub(r"(?is)<(script|style).*?>.*?(</\1>)", "", html.strip())
AttributeError: 'ResultSet' object has no attribute 'strip'

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T23:31:32+00:00

Editorial Team

2026-06-17T23:31:32+00:00Added an answer on June 17, 2026 at 11:31 pm

You are giving clean_html an iterable of BeautifulSoup objects (which is what findAll returns), not a string (which is what clean_html wants).

Assuming that you want a list of div strings that have each been cleaned, do something like:

page = [nltk.clean_html(str(d)) for d in soup.findAll('div',id="bodyContent")]

or

page = map(nltk.clean_html, soup.findAll('div',id="bodyContent"))

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

page=nltk.clean_html(soup.findAll(‘div’,id=bodyContent)) When I try to run this code,It shows : Traceback (most recent call

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply