Here is xml code, and I load this text as String and I need all of its nested tags and content to be replaced with htmlentities..
<?xml version="1.0" encoding="utf-8"?>
<data>
<target><x id="25e02e3e839c-a1e6b03cb682" pid="NLSheets" name="NLSheets" />Sheets"</target>
<target>"<x id="3510a371bdf8-861b965564ea" pid="NLTable" name="NLTable" />Table"</target>
<target>"<x id="48a1560eaa68-c400c8394f0a" pid="NLCaption" name="NLCaption" />Caption"</target>
</data>
I have Used the following php code for the task.
function html_entities($matches) {
return str_replace($matches[1], htmlentities($matches[1]), $matches[0]);
}
function get_tag( $tagname, $xml ) {
$pattern = "/<$tagname ?.*>(.*?)<\/$tagname>/";
$content = preg_replace_callback($pattern, html_entities, $xml);
return $content;
}
$content = get_tag('target', $str);
echo $content;
Now the the problem is with the regular expression. I have used the regular expression
as you can see in get_tag function. $pattern = "/<$tagname ?.*>(.*?)<\/$tagname>/"; which will be built on run time as
/<target ?.*>(.*?)<\/target>
Now I am unable to fix the problem… The nested tags values are not converted to htmlentities.
Please help
Change the line to this:
You need an extra non-greedy modifier to prevent the search for the closing part (
>) of the opening tag from going too far and grabbing your inner content (and thus not making it available to the parenthetical grouping and thus htmlentities).We could improve this a little by using the ‘s’ modifier at the end to allow for newlines within the content (since the dot character doesn’t include newlines by default) as well as to prevent
/within the opening tag while allowing newlines there, and allow any kind of whitespace to separate the element name from the attributes and to allow whitespace at the end of the closing tag:And shorten it:
It is for handling all these possible edge cases that it is safer to use an XML parser. For example, this would not catch: