I’m currently working with parsing XML documents (adding elements, adding attributes, etc). So I first need to parse the XML in before working on it. However, lxml seems to be removing the element <?xml ...>. For example
from lxml import etree
tree = etree.fromstring('<?xml version="1.0" encoding="utf-8"?><dmodule>test</dmodule>', etree.XMLParser())
print etree.tostring(tree)
will result in
<dmodule>test</dmodule>
Does anyone know why the <?xml ...> element is being removed? I thought encoding tags were valid XML. Thanks for your time.
The
<?xml>element is an XML declaration, so it’s not strictly an element. It just gives info about the XML tree below it.If you need to print it out with lxml, there is some info here about the
xmlDeclaration=TRUEflag you can use.http://lxml.de/api.html#serialisation