Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8228987
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T16:38:44+00:00 2026-06-07T16:38:44+00:00

In a follow up to my last question , if you have a string

  • 0

In a follow up to my last question, if you have a string that is malformed in an xml file, you can extract the contents using preg_replace_callback() to remove the elements that break.

The point of this function is not to parse the xml with regex (a bad
idea
), but to try to find xml that doesn’t parse and where it fails so that we
can flag articles that aren’t being correctly formatted before being
sent out. This is part of a set of tools to clean content before
delivery. I am testing it on known malformed public RSS urls as well
as internal ones to see if it caters for a number of situations. The callback will return an integer for the node that failed. If it passes after that, we can report the index of the article and then try to use DOMDocument to try to correct the html and try again. If it fails, we’ll report it as a critical, otherwise, we return the parsing article description and content back to the database, marking it as modified before delivery.

You can then take the broken elements and run them through DOMDocument to format them better to return to the XML file.

However, I’m stuck on how to make this example below return other than false:

Sample XML:

<item>
    <content:encoded><![CDATA[
        This is the text with odd characters that are killing 
        simplexml_load_string() (doesn't recover) and breaking 
        (although recoverable) DOMDocument
    ]]></content:encoded>
</item>

If I use the following PHP, I can extract a description node and convert it from:

<description><![CDATA[
    This is some description text with the same problem
]]></description>

to

<description>0</description>

PHP:

preg_replace_callback(
    '/<description>(.*)<\/description>/', **// add msU modifiers to fix below**
    'node_tidy::callback_description',
    $xml
);

…

private function callback_description($matches=false) {
    if(false !== $matches) {
        $this->arrDescriptions[] = $matches[1];
        return '<description>'.$this->indexDescriptions++.'</description>';
    } else {
        return false;
    }
}

However, when I try to do the same with content:encoded nodes, it returns false. Here’s the related function:

private function callback_content_encoded($matches=false) {
    if(false !== $matches) {
        $this->arrContentEncoded[] = $matches[1];
        return '<content:encoded>'.$this->indexContentEncoded++.'</content:encoded>';
    } else {
        return false;
    }
}

Using a straight regex, to test if it’s the colon, I used this:

<?php

$string = '<content:encoded>this is some text</content:encoded>';
preg_match('/<content\:encoded>(.*)<\/content\:encoded>/',$string,$matches);

echo '<pre>';
print_r($matches);
echo '</pre>';

?>

However, that did not print the expected array with or without adding \:. Could someone point me in the right direction for the misunderstanding here?

Many thanks!

UPDATE:
Here’s a sample snippet of the real xml that fails, as indicated by @Florent.

http://pastebin.com/7z0f3MJP

UPDATE:
This regex matches the required content:

preg_match('/<content\:encoded>(.*)<\/content\:encoded>/msU',$string,$matches);

The m and s and U modifiers are explained better here:
http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

I neglected to consider these modifiers.

The results are now brought back by this regex, including the original problem, so this can now be resolved.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T16:38:45+00:00Added an answer on June 7, 2026 at 4:38 pm

    You should add the following flags to your regex:

    • m to enable multiline strings
    • u to enable UTF8 strings (if necessary)
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Follow up to this question . I have the following code: string[] names =
Follow up to this question for Facebook Friends.getAppUsers using Graph API that pulls friends
I have a follow up to complicated mysql question that I recently asked: Show
This is follow up for my last question about converting string to float I
I have a string that I want to parse using regex. It has the
This is a follow-up to a previous question . I have a string Test
As a follow up question to my last one , is there any simple
This question is a follow-up to my last question with a reference to COM
This is a follow on from my last question relating to plotting timestamps in
this is a follow-up question to my last one here: iOS: Initialise object at

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.