Let’s suppose we have the XML file with the structure as follows. <?xml version=1.0

Question

0

Asked: June 12, 20262026-06-12T05:01:10+00:00 2026-06-12T05:01:10+00:00

Let’s suppose we have the XML file with the structure as follows. <?xml version=1.0

0

Let’s suppose we have the XML file with the structure as follows.

<?xml version="1.0" ?> 
<searchRetrieveResponse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/zing/srw/ http://www.loc.gov/standards/sru/sru1-1archive/xml-files/srw-types.xsd" xmlns="http://www.loc.gov/zing/srw/">
  <records xmlns:ns1="http://www.loc.gov/zing/srw/">
    <record>
      <recordData>
        <record xmlns="">
          <datafield tag="000">
            <subfield code="a">123</subfield>
            <subfield code="b">456</subfield>
          </datafield>
          <datafield tag="001">
            <subfield code="a">789</subfield>
            <subfield code="b">987</subfield>
          </datafield>
        </record>
      </recordData>
    </record>
    <record>
      <recordData>
        <record xmlns="">
          <datafield tag="000">
            <subfield code="a">123</subfield>
            <subfield code="b">456</subfield>
          </datafield>
          <datafield tag="001">
            <subfield code="a">789</subfield>
            <subfield code="b">987</subfield>
          </datafield>
        </record>
      </recordData>
    </record>
  </records>
</searchRetrieveResponse>

I need to parse out:

The content of the “subfield” (e.g. 123 in the example above) and
Attribute values (e.g. 000 or 001)

I wonder how to do that using lxml and XPath. Pasted below is my initial code and I kindly ask someone to explain me, how to parse out values.

import urllib, urllib2
from lxml import etree    

url = "https://dl.dropbox.com/u/540963/short_test.xml"
fp = urllib2.urlopen(url)
doc = etree.parse(fp)
fp.close()

ns = {'xsi':'http://www.loc.gov/zing/srw/'}

for record in doc.xpath('//xsi:record', namespaces=ns):
    print record.xpath("xsi:recordData/record/datafield[@tag='000']", namespaces=ns)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T05:01:11+00:00

I would be more direct in your XPath: go straight for the elements you want, in this case datafield.

>>> for df in doc.xpath('//datafield'):
        # Iterate over attributes of datafield
        for attrib_name in df.attrib:
                print '@' + attrib_name + '=' + df.attrib[attrib_name]

        # subfield is a child of datafield, and iterate
        subfields = df.getchildren()
        for subfield in subfields:
                print 'subfield=' + subfield.text

Also, lxml appears to let you ignore the namespace, maybe because your example only uses one namespace?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Let’s suppose we have the XML file with the structure as follows. <?xml version=1.0

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply