I just spent quite some time trying to remove an attribute with namespace from DOMNode and it is simply not working at all.
The xml is generated from database and looks like this:
<dictionary>
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<table>answers</table>
<entity>Answer</entity>
</row>
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<table>file_trans</table>
<entity>FileTrans</entity>
</row>
...
</dictionary>
The attribute name I am trying to remove is obviously “xmlns:xsi”. The postgres db adds it automatically and I was unable to remove it there, so I am trying to do the job with php.
I load the xml as DOMDocument and then I do a foreach cycle through all the row elements:
$xml = new DOMDocument();
$xml->loadXML($tablesInXml['xmlelement'], LIBXML_NOBLANKS);
foreach($xml->documentElement->childNodes as $row) {
$row->removeAttribute('xsi'); // not working
$row->removeAttribute('xmlns:xsi'); // not working
...
I even tried to scout the DOMNode property attributes and it contains no attributes at all and shows length of 0.
Is this a bug in php 5.3? Does anybody know what else can I do?
Thanks for any answer
You cannot do this in a trivial way with
DOMDocument. Those are not really "attributes" (they are not visible as attributes in the DOM and are not part of the XML Infoset). They are rather namespace declarations and have no existence outside of the xml serialization. Most importantly, they are not represented in the DOM in any way.libxml2 (the underlying xml library for DOMDocument) keeps track of these "namespace nodes" internally but does not expose a public interface to them. Thus if you clone or import a node, the xml namespace declaration will follow even though you can’t see it.
It seems there was a bug in older versions of PHP where you could delete these nodes using
removeAttributeNS, but this has been fixed. See the comment on the PHP documentation for this method.My opinion: you should not try to get rid of these nodes. It’s not worth your time and it doesn’t hurt anything to leave them.
However, if you really want to get rid of them, you have to use another approach. One way you can do it is with manually deep-copying the entire DOM tree to a new DOM document. If you use
createElementNSandsetAttributeNSwhen copying (instead of usingimportNodeorcloneNode), the hidden namespace nodes won’t be created in your copy. I’m not going to write the code for you to do this because it will be tedious.This stackoverflow answer suggests an xslt solution. I’m not sure if it will work since XSLT 1.0 doesn’t expose these namespace declaration "nodes" either.
UPDATE
If you are ok with other things being done to your XML besides just removing the redundant XML nodes, you can try XML canonicalization. The purpose of Canonical XML is to ensure that the same XML infoset always generates the same XML output string. (This is useful for things like comparing XML files or creating checksum hashes.) But it also does things like never use self-closing tags.
Try it and see:
Documentation:
DOMNode::C14N()DOMNode::C14NFile()