I have a html file which looks something similar to this :
<html>
...
<li class="not a user"> </li>
<li class="user">
<a href="abs" ...> </a>
</li>
<li class="user">
<a href="bss" ...> </a>
</li>
...
</html>
given the above input I want to parse the li tags with class=”user” and get the value of the href’s as output. is this possible using beautifulsoup in python ???
my solution was:
data="the above html code snippet"
soup=BeautifulSoup(data)
listset=soup("li","user")
for list in listset:
attrib_value=[a['href'] for a in list.findAll('a',{'href':True})]
obviously i have an error somewhere that it only lists the attribute value for the last anchor tag’s href.
Your code is fine. There are three elements in
listset– andattrib_valuegets overridden in each iteration of your loop, so at the end of the program, it only contains the href values from the last element oflistset, which isbss.Try this instead to keep all values:
and initialize attrib_value to the empty list before the loop (
attrib_value = []).