I am using lxml to parse an xsd file and am looking for an easy way to remove the URL namespace attached to each element name. Here’s the xsd file:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" version="2.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="rootelement">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element minOccurs="1" maxOccurs="1" name="element1">
<xs:complexType>
<xs:all>
<xs:element name="subelement1" type="xs:string" />
<xs:element name="subelement2" type="xs:integer" />
<xs:element name="subelement3" type="xs:dateTime" />
</xs:all>
<xs:attribute name="id" type="xs:integer" use="required" />
</xs:complexType>
</xs:element>
</xs:choice>
<xs:attribute fixed="2.0" name="version" type="xs:decimal" use="required" />
</xs:complexType>
</xs:element>
</xs:schema>
and using this code:
from lxml import etree
parser = etree.XMLParser()
data = etree.parse(open("testschema.xsd"),parser)
root = data.getroot()
rootelement = root.getchildren()[0]
rootelementattribute = rootelement.getchildren()[0].getchildren()[1]
print "root element tags"
print rootelement[0].tag
print rootelementattribute.tag
elements = rootelement.getchildren()[0].getchildren()[0].getchildren()
elements_attribute = elements[0].getchildren()[0].getchildren()[1]
print "element tags"
print elements[0].tag
print elements_attribute.tag
subelements = elements[0].getchildren()[0].getchildren()[0].getchildren()
print "subelements"
print subelements
I get the following output
root element tags
{http://www.w3.org/2001/XMLSchema}complexType
{http://www.w3.org/2001/XMLSchema}attribute
element tags
{http://www.w3.org/2001/XMLSchema}element
{http://www.w3.org/2001/XMLSchema}attribute
subelements
[<Element {http://www.w3.org/2001/XMLSchema}element at 0x7f2998fb16e0>, <Element {http://www.w3.org/2001/XMLSchema}element at 0x7f2998fb1780>, <Element {http://www.w3.org/2001/XMLSchema}element at 0x7f2998fb17d0>]
I don’t want “{http://www.w3.org/2001/XMLSchema}” to appear at all when I pull the tag data (altering the xsd file is not an option). The reason I need the xsd tag info is that I am using this to validate column names from a series of flat files. On the “element” level there are multiple elements that I’m pulling, as well as subelements, which I am using a dictionary to validate columns. Also, any suggestions on improving the code above would be greatly, such as a way to use fewer “getchildren” calls, or just make it more organized.
I’d use:
But you could also use the xpath function
local-name():As for fewer
getchildren()calls: just leave them out.getchildren()is a deprecated way of making a list of the direct children (you should just uselist(elem)instead if you actually want this).You can iterate over, or use an index on an element directly. For example:
rootelement[0]will give you the first child element ofrootelement(but more efficient than if you were userootelement.getchildren()[0], because this would act likelist(rootelement)and create a new list first)