Given an HTML link like
<a href='urltxt' class='someclass' close='true'>texttxt</a>
how can I isolate the url and the text?
Updates
I’m using Beautiful Soup, and am unable to figure out how to do that.
I did
soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url)) links = soup.findAll('a') for link in links: print 'link content:', link.content,' and attr:',link.attrs
i get
*link content: None and attr: [(u'href', u'_redirectGeneric.asp?genericURL=/root /support.asp')]* ... ...
Why am i missing the content?
edit: elaborated on ‘stuck’ as advised 🙂
Use Beautiful Soup. Doing it yourself is harder than it looks, you’ll be better off using a tried and tested module.
EDIT:
I think you want:
By the way, it’s a bad idea to try opening the URL there, as if it goes wrong it could get ugly.
EDIT 2:
This should show you all the links in a page: