I am using lxml to parse an xsd file and am looking for an

Question

0

Asked: May 23, 20262026-05-23T19:56:41+00:00 2026-05-23T19:56:41+00:00

I am using lxml to parse an xsd file and am looking for an

0

I am using lxml to parse an xsd file and am looking for an easy way to remove the URL namespace attached to each element name. Here’s the xsd file:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" version="2.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="rootelement">
    <xs:complexType>
      <xs:choice maxOccurs="unbounded">
        <xs:element minOccurs="1" maxOccurs="1" name="element1">
          <xs:complexType>
            <xs:all>
              <xs:element name="subelement1" type="xs:string" />
              <xs:element name="subelement2" type="xs:integer" />
              <xs:element name="subelement3" type="xs:dateTime" />
            </xs:all>
            <xs:attribute name="id" type="xs:integer" use="required" />
          </xs:complexType>
        </xs:element>
       </xs:choice>
      <xs:attribute fixed="2.0" name="version" type="xs:decimal" use="required" />
    </xs:complexType>
  </xs:element>
</xs:schema>

and using this code:

from lxml import etree

parser = etree.XMLParser()
data = etree.parse(open("testschema.xsd"),parser)
root = data.getroot()
rootelement = root.getchildren()[0]
rootelementattribute = rootelement.getchildren()[0].getchildren()[1]
print "root element tags"
print rootelement[0].tag
print rootelementattribute.tag
elements = rootelement.getchildren()[0].getchildren()[0].getchildren()
elements_attribute = elements[0].getchildren()[0].getchildren()[1]
print "element tags"
print elements[0].tag
print elements_attribute.tag
subelements = elements[0].getchildren()[0].getchildren()[0].getchildren()
print "subelements"
print subelements

I get the following output

root element tags
{http://www.w3.org/2001/XMLSchema}complexType
{http://www.w3.org/2001/XMLSchema}attribute
element tags
{http://www.w3.org/2001/XMLSchema}element
{http://www.w3.org/2001/XMLSchema}attribute
subelements
[<Element {http://www.w3.org/2001/XMLSchema}element at 0x7f2998fb16e0>, <Element {http://www.w3.org/2001/XMLSchema}element at 0x7f2998fb1780>, <Element {http://www.w3.org/2001/XMLSchema}element at 0x7f2998fb17d0>]

I don’t want “{http://www.w3.org/2001/XMLSchema}” to appear at all when I pull the tag data (altering the xsd file is not an option). The reason I need the xsd tag info is that I am using this to validate column names from a series of flat files. On the “element” level there are multiple elements that I’m pulling, as well as subelements, which I am using a dictionary to validate columns. Also, any suggestions on improving the code above would be greatly, such as a way to use fewer “getchildren” calls, or just make it more organized.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T19:56:42+00:00

I’d use:

print elem.tag.split('}')[-1]

But you could also use the xpath function local-name():

print elem.xpath('local-name()')

As for fewer getchildren() calls: just leave them out. getchildren() is a deprecated way of making a list of the direct children (you should just use list(elem) instead if you actually want this).

You can iterate over, or use an index on an element directly. For example: rootelement[0] will give you the first child element of rootelement (but more efficient than if you were use rootelement.getchildren()[0], because this would act like list(rootelement) and create a new list first)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using lxml to parse an xsd file and am looking for an

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply