From this html source:
<div class="category_link">
Category:
<a href="/category/personal">Personal</a>
</div>
I wish to extract the text Category:
Here are my attempts using Python/BeautifulSoup (with output as comment – after the #)
parsed = BeautifulSoup(sample_html)
parsed_div = parsed.findAll('div')[0]
parsed_div.firstText() # <a href="/category/personal">Personal</a>
parsed_div.first() # <a href="/category/personal">Personal</a>
parsed_div.findAll()[0] # <a href="/category/personal">Personal</a>
I’d expect a “text node” to be available as the first child. Any suggestions on how I can solve this?
I’m fairly sure the following should do what you want
That would return a
NavigableStringinstance which is pretty much the samething as a
unicodeinstance, but you may callunicodeon that to get aunicode object.
I’ll see if I can test this out and let you know.
EDIT: I just confirmed that it works: