I’m using lxml to parse and objectify xml files in a path, I have

Question

0

Asked: June 9, 20262026-06-09T05:19:09+00:00 2026-06-09T05:19:09+00:00

I’m using lxml to parse and objectify xml files in a path, I have

0

I’m using lxml to parse and objectify xml files in a path, I have a lot of model and xsd’s, each object model maps to certain defined classes, for example if xml starts with model tag so it is a dataModel and if it starts with page tag it is a viewModel.

My question is how to detect in efficient way that xml file starts with which tag and then parse it with an appropriate xsd file and then objectify it

files = glob(os.path.join('resources/xml', '*.xml'))
for f in files:
    xmlinput = open(f)
    xmlContent = xmlinput.read()

    if xsdPath:
        xsdFile = open(xsdPath)
        # xsdFile should retrieve according to xml content
        schema = etree.XMLSchema(file=xsdFile)

        xmlinput.seek(0)
        myxml = etree.parse(xmlinput)

        try:
            schema.assertValid(myxml)

        except etree.DocumentInvalid as x:
            print "In file %s error %s has occurred." % (xmlPath, x.message)
        finally:
            xsdFile.close()

    xmlinput.close()

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T05:19:11+00:00

I leave aside voluntarily file reading and treatments, to concentrate on your problem:

>>> from lxml.etree import fromstring
>>> # We have XMLs with different root tag
>>> tree1 = fromstring("<model><foo/><bar/></model>")
>>> tree2 = fromstring("<page><baz/><blah/></page>")
>>>
>>> # We have different treatments
>>> def modelTreatement(etree):
...     return etree.xpath('//bar')
...
>>> def pageTreatment(etree):
...     return etree.xpath('//blah')
...
>>> # Here is a recipe to read the root tag
>>> tree1.getroottree().getroot().tag
'model'
>>> tree2.getroottree().getroot().tag
'page'
>>>
>>> # So, by building an appropriated dict :
>>> tag_to_treatment_map = {'model': modelTreatement, 'page': pageTreatment}
>>> # You can run the right method on the right tree
>>> for tree in [tree1, tree2]:
...     tag_to_treatment_map[tree.getroottree().getroot().tag](tree)
...
[<Element bar at 0x24979b0>]
[<Element blah at 0x2497a00>]

Hope this will be useful to someone, even if I had not seen this earlier.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using lxml to parse and objectify xml files in a path, I have

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply