In [1]: from lxml import etree
I’ve got an HTML document:
In [2]: root = etree.fromstring(u'''<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">\n<HTML></HTML>''', etree.HTMLParser())
Its doctype is parsed correctly:
In [3]: root.getroottree().docinfo.doctype
Out[3]: u'<!DOCTYPE html PUBLIC "-//IETF//DTD HTML//EN">'
But when serializing it, I am losing it:
In [4]: etree.tostring(root.getroottree(), method='html')
Out[4]: '<html></html>'
What should I do to get that doctype serialized?
Debian GNU/Linux, Sid. Python 2.6.6. lxml 2.2.8-2.
Bug, as mentioned in a comment to another answer: missing doctype when serialized. Fix in February 2015 to be released in version 3.5 of
lxml.