I am parsing an XML feed from Google using beautifulstonesoup and python, and it works great. I am also creating a csv and uploading it to Google Docs, which works fine as well. The problem is when I come across an empty text attribute in the xml, the parser just stops. It is not a problem now, because all of the attributes have data, but the first time they don’t, it will break.
The code:
import atom
import gdata.auth
import gdata.contacts
import gdata.contacts.client
import gdata.docs.service
import gdata.docs.data
from BeautifulSoup import BeautifulStoneSoup as Soup
import csv
email = 'admin@domain.com'
password = 'password'
domain = 'domain.com'
ms_client = gdata.docs.service.DocsService()
gd_client = gdata.contacts.client.ContactsClient(domain=domain)
gd_client.ClientLogin(email, password, 'profileFeedAPI')
ms_client.ClientLogin(email, password, 'peopleCSVupload')
profiles_feed = gd_client.GetProfilesFeed('https://www.google.com/m8/feeds/profiles/domain/domain.com/full?max-results=300')
soup = Soup(str(profiles_feed), selfClosingTags=['ns0:category','ns3:status', 'ns0:link','ns1:email'])
a = soup.findAll('ns0:entry')
f = open('C:\\people.csv', 'wb')
writer = csv.writer(f, quoting=csv.QUOTE_NONE, escapechar =' ')
for entry in a:
writer.writerow([entry.find('ns1:familyname').text + ',' + entry.find('ns1:givenname').text + ',' + entry.find('ns1:fullname').text + ',' + entry.find('ns1:orgtitle').text + ',' + entry.find('ns1:orgdepartment').text + ',' + entry.find('ns1:orgname').text + ',' + entry.find('ns1:email',primary=True)['address']])
f.close()
ms = gdata.data.MediaSource(file_path="C:\\people.csv", content_type=gdata.docs.service.SUPPORTED_FILETYPES['CSV'])
csv_entry = ms_client.Upload(ms, "People File")
I know I could do this:
for entry in a:
if entry.find('ns1:orgtitle') != None:
print entry.find('ns1:orgtitle').text
elif entry.find('ns1:orgtitle') == None:
print('')
if entry.find('ns1:familyname') != None:
print entry.find('ns1:familyname').text
elif entry.find('ns1:familyname') == None:
print('')
etc...
But it is very long, and I don’t know how to concentrate the data to appear on one row. Any help, much appreciated.
you can wrap the find like this:
the you can either do the 7 calls one after each other or you can use map(), like