I’m having a bit of an issue. I’m parsing a large xml file using Python. The Problem is that the xml file is unpredictable and sometimes certain elements may not be present in the xml, and because of this Python throws an Exception when it looks for it. I want Python to simply ignore this Exception and move on looking for the next element.
Here is my code at the moment, which doesn’t work. If it can’t find the element it’s looking for it’ll just throw an exception and move on out of the try-except block.
# now we can parse the xml we fetched.
try:
user = {}
feedLinks = response.getElementsByTagName('gd:feedLink')
statistics = response.getElementsByTagName('yt:statistics')[0]
user['id'] = response.getElementsByTagName('id')[0].firstChild.data
user['channel_title'] = response.getElementsByTagName('title')[0].firstChild.data
user['profile_url'] = response.getElementsByTagName('link')[0].getAttribute('href')
user['author_name'] = response.getElementsByTagName('author')[0].firstChild.firstChild.data
user['author_uri'] = response.getElementsByTagName('uri')[0].firstChild.data
user['age'] = response.getElementsByTagName('yt:age')[0].firstChild.data
user['favourites_url'] = feedLinks[0].getAttribute('href')
user['contacts_url'] = feedLinks[1].getAttribute('href')
user['playlists'] = feedLinks[3].getAttribute('href')
user['subscriptions'] = feedLinks[4].getAttribute('href')
user['uploads'] = feedLinks[5].getAttribute('href')
user['new_subscription_videos'] = feedLinks[6].getAttribute('href')
user['statistics'] = {'last_access':statistics.getAttribute('lastWebAccess'),
'subscriber_count':statistics.getAttribute('subscriberCount'),
'video_watch_count':statistics.getAttribute('videoWatchCount'),
'view_count':statistics.getAttribute('viewCount'),
'total_upload_views':statistics.getAttribute('totalUploadViews')}
user['gender'] = response.getElementsByTagName('yt:gender')[0].firstChild.data
user['location'] = response.getElementsByTagName('yt:location')[0].firstChild.data
user['profile_pic_url'] = response.getElementsByTagName('media:thumbnail')[0].getAttribute('url')
user['username'] = response.getElementsByTagName('yt:username')[0].firstChild.data
except Exception, error:
# store the error for logging later
self.errors.append(str(error) + " from main.py:Crawler")
Does anybody have any ideas?
Above is the function Which convert XML into python dictionary.
the XML I taken is in the form :
Once You get the dictionary you problem reduced 1000 times.