page=nltk.clean_html(soup.findAll('div',id="bodyContent"))
When I try to run this code,It shows :
Traceback (most recent call last):
File "C:\Python27\wiki3.py", line 36, in <module>
page=nltk.clean_html(soup.findAll('div',id="bodyContent"))
File "C:\Python27\lib\site-packages\nltk-2.0.4-py2.7.egg\nltk\util.py", line 340, in clean_html
cleaned = re.sub(r"(?is)<(script|style).*?>.*?(</\1>)", "", html.strip())
AttributeError: 'ResultSet' object has no attribute 'strip'
You are giving
clean_htmlan iterable ofBeautifulSoupobjects (which is whatfindAllreturns), not a string (which is whatclean_htmlwants).Assuming that you want a list of
divstrings that have each been cleaned, do something like:or