I have this code for automatically indenting XML through PHP:
function xmlpp($xml, $html_output=false) {
if ($xml == '') return 'NULL';
try {
$xml_obj = @new SimpleXMLElement($xml);
} catch (Exception $ex) {
// Error parsing xml, return same string
return ($html_output) ? htmlentities($xml) : $xml;
}
$level = 4;
$indent = 0; // current indentation level
$pretty = array();
// get an array containing each XML element
$xml = explode("\n", preg_replace('/>\s*</', ">\n<", $xml_obj->asXML()));
// shift off opening XML tag if present
if (count($xml) && preg_match('/^<\?\s*xml/', $xml[0])) {
//$pretty[] = array_shift($xml);
array_shift($xml);
}
foreach ($xml as $el) {
if (preg_match('/^<([\w])+[^>\/]*>$/U', $el)) {
// opening tag, increase indent
$pretty[] = str_repeat(' ', $indent) . $el;
$indent += $level;
} else {
if (preg_match('/^<\/.+>$/', $el)) {
$indent -= $level; // closing tag, decrease indent
}
if ($indent < 0) {
$indent += $level;
}
$pretty[] = str_repeat(' ', $indent) . $el;
}
}
$xml = implode("\n", str_replace('"', "'", $pretty));
return ($html_output) ? htmlentities($xml, ENT_COMPAT, 'UTF-8') : $xml;
}
The issue is that whenever I get an attribute value containing a / character, the indentation level is reduced. For example, the output produced for the following is incorrect:
<function desc='Cancel/Refund'>
<const value='1'/>
<const value='1'/>
<const value='1'/>
</function>
I know the regular expression shouldn’t match the words Cancel/Refud but it does and I can’t figure out how to fix this.
Any hints would be appreciated.
The
[^\/]at the beginning and end of the regex says to match tags that don’t start with a/and don’t end with a/. This way you only get opening tags and not closing tags or empty tags. The.+will match anything so it doesn’t matter if you have/inside the tags attributes or not as long as it doesn’t start or end with a/.