From this html source: <div class=category_link> Category: <a href=/category/personal>Personal</a> </div> I wish to extract

Question

0

Asked: June 1, 20262026-06-01T20:18:14+00:00 2026-06-01T20:18:14+00:00

From this html source: <div class=category_link> Category: <a href=/category/personal>Personal</a> </div> I wish to extract

0

From this html source:

<div class="category_link">
  Category:
  <a href="/category/personal">Personal</a>
</div>

I wish to extract the text Category:

Here are my attempts using Python/BeautifulSoup (with output as comment – after the #)

parsed = BeautifulSoup(sample_html)
parsed_div = parsed.findAll('div')[0]
parsed_div.firstText() # <a href="/category/personal">Personal</a>
parsed_div.first() # <a href="/category/personal">Personal</a>
parsed_div.findAll()[0] # <a href="/category/personal">Personal</a>

I’d expect a “text node” to be available as the first child. Any suggestions on how I can solve this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T20:18:15+00:00

I’m fairly sure the following should do what you want

parsed.find('a').previousSibling # or something like that

That would return a NavigableString instance which is pretty much the same
thing as a unicode instance, but you may call unicode on that to get a
unicode object.

I’ll see if I can test this out and let you know.

EDIT: I just confirmed that it works:

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('<div class=a>Category: <a href="/">a link</a></div>')
>>> soup.find('a')
<a href="/">a link</a>
>>> soup.find('a').previousSibling
u'Category: '
>>>

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

From this html source: <div class=category_link> Category: <a href=/category/personal>Personal</a> </div> I wish to extract

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply