If I output the string before adding it as a text node to the DOMDocument tree then I can see that the original UTF-8 encoding is preserved. All umlauts are UTF-8 encoded – for sure.
Then I add the string and output the DOM-tree-object through saveXML() and all umlauts have been replaced by their respective numerical entity.
I create the DOMDocument like this: $xmlDoc = new \DOMDocument('1.0', 'UTF-8');
Shouldn’t XML then keep all UTF-8 encoded chars alone as long as they aren’t XML-special chars?
I don’t think that this is a bug.
DOMDocument::loadXML()simply seems to override the internal version and encoding settings with the ones detected in the given XML string – actually overriding everything that has been set in theDOMDocumentconstructor.So if you’re using
DOMDocument::loadXML()you have to ensure that the XML string contains a valid XML declaration.The constructor arguments are used when you built the document from scratch.