I am trying to parse an XML file with python using lxml, but get an error on basic attempts. I use this post and the lxml tutorials to bootstrap.
My XML file is basically built from records below (I trimmed it down so that it is easier to read):
<?xml version="1.0" ?>
<?xml-stylesheet href="file:///usr/share/nmap/nmap.xsl" type="text/xsl"?>
<nmaprun scanner="nmap" args="nmap -sV -p135,12345 -oX 10.232.0.0.16.xml 10.232.0.0/16" start="1340201347" startstr="Wed Jun 20 16:09:07 2012" version="5.21" xmloutputversion="1.03">
<host>
<hostnames>
<hostname name="host1.example.com" type="PTR"/>
</hostnames>
</host>
</nmaprun>
I run it through this complicated script:
from lxml import etree
d = etree.parse("myfile.xml")
for host in d.findall("host"):
aa = host.find("hostnames/hostname")
print aa.attrib["name"]
I get AttributeError: 'NoneType' object has no attribute 'attrib' on the print line.
I checked the value of d, host and aa and they are all defined as Elements.
Upfront apologies if this is something obvious (and it probably is).
EDIT: I added the header of the XML file as requested (I am still reading and rereading the answers :))
Thanks!
Though it would make more sense to use XPath, your code already works fine when standing alone, so long as one handles the case where a host has no hostnames found:
With XPath (
doc.xpath()rather thandoc.find()ordoc.findall()), one could do better, filtering only for hostnames with a name and thus avoiding the faulty records altogether:host[hostnames/hostname/@name]will findhosts which have at least onehostnameswith ahostnamewith a anameattribute.//hostnames/hostname/@namewill directly return only the names themselves (if usinglxml, exposing these as strings).