I’m using lxml as follows to parse an exported XML file from another system:
xmldoc = open(filename)
etree.parse(xmldoc)
But im getting:
lxml.etree.XMLSyntaxError: Entity
‘eacute’ not defined, line 4495,
column 46
Obviously it’s having problems with unicode entity names – but how would i get round this? Via open() or parse()?
Edit: I had forgotten to include my DTD in the same folder – it’s there now and has the following declaration:
<!ENTITY eacute "é">
and is referred to (and always was) in xmldoc as so:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE DScribeDatabase SYSTEM "foo.dtd">
Yet I still get the same problem … does the DTD need to be declared in Python too?
eacuteis not a predefined entity in XML. To include anéentity reference in an XML file, it must have a<!DOCTYPE>declaration pointing to a DTD (such as an XHTML 1.0 DTD) that defines the entity.If the XML uses
ébut doesn’t have a<!DOCTYPE>, it is not well-formed and the system that exported it needs to be fixed.(There isn’t a good reason to use an entity reference to represent
éin an XML file. The character referenceéis understood everywhere without entity definitions, if the file can’t simply include a raw UTF-8éfor some reason.)