I am trying to extract the text out of nested tags for example the

Question

0

Asked: May 27, 20262026-05-27T00:15:18+00:00 2026-05-27T00:15:18+00:00

I am trying to extract the text out of nested tags for example the

0

I am trying to extract the text out of nested tags for example the xml is in the form:

<thread id = 1_1>
  <post id = 1>
    <title>
      <ne>MediaPortal</ne> Install Guide
    </title>
    <content>
      <ne>MediaPortal</ne> Install Guide 0. Introduction and pre-requisites 
      <ne>MediaPortal</ne> is an open-source and free full-fledged <ne>HTPC</ne>
      front-end. It does everything you can ask for in a media center: video 
      playback, music playback, photo viewing, weather, TV tuning and recording, 
      etc. It has wide community support and thanks to it's excellent plug-in 
      and  skinning framework, there are lots of community-developed extensions 
      you can  pick and choose to make it your own. It is far more configurable 
      than <ne>Windows Media Center</ne>, and it works out-of-the-box with the 
      <ne>MCE</ne> remote. And because it provides so much more configuration 
      some find it a daunting task to install and configure. Therefore, this 
      guide will help alleviate some of that burden and help get a 
      <ne>MediaPortal</ne> installation up &amp; running. This guide is not 
      intended to replace the wonderful <ne>MediaPortal</ne> documentation, but 
      rather to introduce the AVS community to <ne>MediaPortal</ne> and provide
      a quick and easy set-up guide. If you need more details on configuration
    </content>
  </post>
</thread>

I need to extract data within the tags and save it in a separate file. I am able to do that and then I extract the tag having out of the beautiful soup object. Now, I want to extract the text from the and tags and put it in a separate file. Please give some suggestion how can this be achieved.

After extracting the tags out of the soup object if I do

for title in soup.find('title')
   print title.string

then it gives None on console for title tags having tags before extracting tags.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T00:15:19+00:00

From BeautifulSoup documentation:

For your convenience, if a tag has only one child node,
and that child node is a string,the child node is made
available as tag.string, as well as tag.contents[0].

However, in your case:

>>> t = soup.find('title')
<title><ne>MediaPortal</ne> Install Guide</title>

Hence, in your case, you cannot use tag.string. However, you can still use tag.contents or tag.text:

>>> t.contents
[<ne>MediaPortal</ne>, u' Install Guide']
>>> t.text
u'MediaPortalInstall Guide'

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to extract the text out of nested tags for example the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply