Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8867515
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T17:03:15+00:00 2026-06-14T17:03:15+00:00

I am parsing several XML document feeds with BeautifulSoup, and would like to do

  • 0

I am parsing several XML document feeds with BeautifulSoup, and would like to do some preprocessing to replace non-standard CDATA tags with custom XML tags. To illustrate:

The following XML source…

<title>The end of the world as we know it</title>
<category><![CDATA[Planking Dancing]]></category>
<pubDate><![CDATA[Sun, 16 Sep 2012 12:00:00 EDT]]></pubDate>
<dc:creator><![CDATA[Bart Simpson]]></dc:creator>

…would turn into:

<title>The end of the world as we know it</title>
<category><myTag>Planking Dancing<myTag></category>
<pubDate><myTag>Sun, 16 Sep 2012 12:00:00 EDT<myTag></pubDate>
<dc:creator><myTag>Bart Simpson<myTag></dc:creator>

I don’t think this question has been asked before on SO (I tried a few different SO queries). I’ve also tried a few different approaches using .findAll('cdata', text=True) and the applying the BeautifulSoup replaceWith() method to each resulting NavigableString. The attempts I’ve made have resulted in either no substitution, or what looks like a recursive loop.

I’m happy to post my previous attempts, but given that the problem here is quite simple I’m hoping someone can post a clear example of how to accomplish the search-and-replace above using BeautifulSoup 3.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T17:03:17+00:00Added an answer on June 14, 2026 at 5:03 pm

    CData is a subclass of NavigableString, so you can find all CData
    elements by first searching for all NavigableString objects, and then testing
    whether each is an instance of CData. Once you’ve got one, it’s easily
    replaced using replaceWith, as you suggested:

    >>> from BeautifulSoup import BeautifulSoup, CData, Tag
    >>> source = """
    ... <title>The end of the world as we know it</title>
    ... <category><![CDATA[Planking Dancing]]></category>
    ... <pubDate><![CDATA[Sun, 16 Sep 2012 12:00:00 EDT]]></pubDate>
    ... <dc:creator><![CDATA[Bart Simpson]]></dc:creator>
    ... """
    >>> soup = BeautifulSoup(source)
    >>> for navstr in soup(text=True):
    ...     if isinstance(navstr, CData):
    ...         tag = Tag(soup, "myTag")
    ...         tag.insert(0, navstr[:])
    ...         navstr.replaceWith(tag)
    ... 
    >>> soup
    
    <title>The end of the world as we know it</title>
    <category><myTag>Planking Dancing</myTag></category>
    <pubdate><myTag>Sun, 16 Sep 2012 12:00:00 EDT</myTag></pubdate>
    <dc:creator><myTag>Bart Simpson</myTag></dc:creator>
    
    >>>
    

    A couple of notes:

    • you can call a BeautifulSoup object as though it were a function, and the
      effect is the same as calling its .findAll() method.

    • The only way I know to get the content of a CData object in BS3 is to slice
      it, as in the snippet above. str(navstr) would keep all the
      <![CDATA[...]]> junk, which obviously you don’t want. In BS4, str(navstr)
      gives you the content without the junk.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

When parsing an xml file in android, I'm doing like this: try { InputStream
I have a problem parsing a XML file which contains special characters like ,
I am creating several XmlElements used in a larger XML Document. XmlElement thisElement =
In the middle of an XML document I'm transforming, there is a CDATA node
I have a WCF service with several methods. I would like to log the
I am trying to parse some XML that looks similar to this: <document> <headings>
I have question about parsing in Html helper : I have sth like: @foreach
hey guys, we have a loop that: 1.Loops over several thousand xml files. Altogether
I need to parse several large size XML files (one is ~8GB, others are
I have got an application, that is getting data via XML-files. During the parsing-part

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.