I am trying to parse the xml feed provide by google apis for there profiles. The xml looks like this:
<ns0:feed ns1:etag="W/"Dk8BQ3o8eCt7I2A9WhRUE0g."">
<ns0:updated>2012-01-23T21:40:52.470Z</ns0:updated>
<ns0:category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#profile"/>
<ns0:id>domain.com</ns0:id>
<ns0:generator uri="http://www.google.com/m8/feeds" version="1.0">Contacts</ns0:generator>
<ns0:author>
<ns0:name>domain.com</ns0:name>
</ns0:author>
<ns0:link href="http://www.google.com/" rel="alternate" type="text/html"/>
<ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full" rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml"/>
<ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full/batch" rel="http://schemas.google.com/g/2005#batch" type="application/atom+xml"/>
<ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full?max-results=300" rel="self" type="application/atom+xml"/>
<ns2:startIndex>1</ns2:startIndex>
<ns2:itemsPerPage>300</ns2:itemsPerPage>
<ns0:entry ns1:etag=""URRaQR4KTit7I2A4"">
<ns0:category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#profile"/>
<ns0:id>http://www.google.com/m8/feeds/profiles/domain/domain.com/full/pname</ns0:id>
<ns1:name>
<ns1:familyName>Name</ns1:familyName>
<ns1:fullName>Persobn Name</ns1:fullName>
<ns1:givenName>Robert</ns1:givenName>
</ns1:name>
<ns0:updated>2012-01-23T21:40:52.597Z</ns0:updated>
<ns1:organization primary="true" rel="http://schemas.google.com/g/2005#work">
<ns1:orgTitle>JobField</ns1:orgTitle>
<ns1:orgDepartment>DepartmentField</ns1:orgDepartment>
<ns1:orgName>CompanyField</ns1:orgName>
</ns1:organization>
<ns3:status indexed="true"/>
<ns0:title>Person Name</ns0:title>
<ns0:link href="https://www.google.com/m8/feeds/photos/profile/domain.com/pname" rel="http://schemas.google.com/contacts/2008/rel#photo" type="image/*"/>
<ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full/pname" rel="self" type="application/atom+xml"/>
<ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full/pname" rel="edit" type="application/atom+xml"/>
<ns1:email address="pname@gapps.domain.com" rel="http://schemas.google.com/g/2005#other"/>
<ns1:email address="pname@domain.com" primary="true" rel="http://schemas.google.com/g/2005#other"/>
<ns4:edited>2012-01-23T21:40:52.597Z</ns4:edited>
</ns0:entry>
I only need the name fields and the fields that are under the Ofganization ns. My question is on the proper way to do this. I have never had to parse xml before, and I see people saying Element Tree, Stone Soup, sax, so on. I have this so far:
import xml.dom.minidom
def explore_children(nodelist,inset):
for subnode in nodelist:
if (subnode.nodeType == subnode.ELEMENT_NODE):
which = subnode.tagName
called = "" # in case it's not an img or title
if (which == "img"): called = subnode.getAttribute("name")
if (which == "title"): called = subnode.getAttribute("text")
print inset + which + " " + called
explore_children(subnode.childNodes," "+inset)
if (subnode.nodeType == subnode.TEXT_NODE):
pass
fh = open("c:\\python27\\junk.xml","r")
doc = xml.dom.minidom.parse(fh)
explore_children(doc.childNodes,"")
Which prints all attrib names to the console and the text from any with name or text. What i want is all the name and org text from a record on one line, and I am completely lost, any help would be much appreciated.
You don’t have to do this manually, use google’s gdata library:
then:
more examples: http://code.google.com/googleapps/domain/profiles/developers_guide.html (the Python tabs in all examples)