I’m trying to learn the basics of XML processing (with lxml) running Python 2.7.2. I’ve created a REALLY simple starting file, but it’s cratering. The code is:
from lxml import etree
doc = etree.parse('/Desktop/plc_dmt.xml')
print doc
I’ve tried variations on this code, using different xml files, and also opening the file first before executing the etree.parse() method, but they all yield a similar or identical error message, below:
Traceback (most recent call last):
File "XMLparse_test.py", line 7, in <module>
doc = etree.parse('/Users/Dad/Desktop/plc_dmt.xml')
File "lxml.etree.pyx", line 2954, in lxml.etree.parse (src/lxml/lxml.etree.c:56220)
... {Misc error stuff}
...
lxml.etree.XMLSyntaxError: xmlParsePI : no target name, line 3, column 14
I confirmed that at least some of the XML files were well-formed, at least insofar as they ran correctly on a web server. I don’t understand the error message — What is the target name it is seeking?
Here’s the input xml file.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<response>
<heartbeat><?--#exec cmd_argument='printf( "0x%02X%02X", InReadUByte( 0 ), InReadUByte( 1 ))'--> </heartbeat>
<dmt node="1">
<address><?--#exec cmd_argument='printf( "0x%02X", InReadUByte( 20 ))'--></address>
<status><?--#exec cmd_argument='printf( "0x%02X", InReadUByte( 21 ))'--></status>
<realflow><?--#exec cmd_argument='printf( "%f", InReadFloat( 22 ))'--></realflow>
<pressure><?--#exec cmd_argument='printf( "0x%02X%02X", InReadUByte( 26 ), InReadUByte( 27 ))'--></pressure>
<temp><?--#exec cmd_argument='printf( "0x%02X%02X", InReadUByte( 28 ), InReadUByte( 29 ))'--></temp>
</dmt>
# Misc stuff pulled out to keep file shorter...
</response>
Much of the embedded code are Server Side Include commands for this web server, which is connected to some instrumentation. This file does operate correctly on the server.
Your XML is invalid because you have
<and>characters inside your elements. They have to be escaped.If they were actually supposed to be comments, this is what it should be:
If that was actually supposed to be text, then they need to be escaped, like this:
The above two documents are both valid.