I am parsing a website using BeautifulSoup. I know that the content I want is in a div of class content. And that the content is all in p tags. So I ran
paragraphs= content.findAll('p')
It is fine till here. I iterate over the list, and have an if condition that’ll break out of the loop if a particular class is encountered.
for para in paragraphs:
if 'class' in para:
if para['class']=='end':
break
But this isn’t working. When I run the loop it doesn’t break when the end class is encountered. In fact, while iterating over the loop, the classes of all the elements seem to get lost.
for para in paragraphs:
if 'class' in para:
print para['class']
This prints out nothing, even though there are elements with classes. In fact, this piece of code does print out the class –
>>>paragraphs[0]['class']
u'dateline'
But,
>>> print 'class' in paragraphs[0]
False
I don’t quiet understand what is going on here. Eventually I solved my problem by using exceptions, but this is kinda bugging me. Can anybody explain what is happening here?
When you’re putting
if 'class' in para, you’re literally saying if there was the actual word class in the paragraph. I believe your intention was to see if it has a class, so what you want is: