I’m having a bit of an issue. I’m parsing a large xml file using

Question

0

Asked: May 23, 20262026-05-23T07:40:14+00:00 2026-05-23T07:40:14+00:00

I’m having a bit of an issue. I’m parsing a large xml file using

0

I’m having a bit of an issue. I’m parsing a large xml file using Python. The Problem is that the xml file is unpredictable and sometimes certain elements may not be present in the xml, and because of this Python throws an Exception when it looks for it. I want Python to simply ignore this Exception and move on looking for the next element.

Here is my code at the moment, which doesn’t work. If it can’t find the element it’s looking for it’ll just throw an exception and move on out of the try-except block.

# now we can parse the xml we fetched.
        try:
            user = {}
            feedLinks = response.getElementsByTagName('gd:feedLink')
            statistics = response.getElementsByTagName('yt:statistics')[0]
            user['id'] = response.getElementsByTagName('id')[0].firstChild.data
            user['channel_title'] = response.getElementsByTagName('title')[0].firstChild.data
            user['profile_url'] = response.getElementsByTagName('link')[0].getAttribute('href')
            user['author_name'] = response.getElementsByTagName('author')[0].firstChild.firstChild.data
            user['author_uri'] = response.getElementsByTagName('uri')[0].firstChild.data
            user['age'] = response.getElementsByTagName('yt:age')[0].firstChild.data
            user['favourites_url'] = feedLinks[0].getAttribute('href')
            user['contacts_url'] = feedLinks[1].getAttribute('href')
            user['playlists'] = feedLinks[3].getAttribute('href')
            user['subscriptions'] = feedLinks[4].getAttribute('href')
            user['uploads'] = feedLinks[5].getAttribute('href')
            user['new_subscription_videos'] = feedLinks[6].getAttribute('href')
            user['statistics'] = {'last_access':statistics.getAttribute('lastWebAccess'), 
                            'subscriber_count':statistics.getAttribute('subscriberCount'), 
                            'video_watch_count':statistics.getAttribute('videoWatchCount'),
                            'view_count':statistics.getAttribute('viewCount'), 
                            'total_upload_views':statistics.getAttribute('totalUploadViews')}
            user['gender'] = response.getElementsByTagName('yt:gender')[0].firstChild.data
            user['location'] = response.getElementsByTagName('yt:location')[0].firstChild.data
            user['profile_pic_url'] = response.getElementsByTagName('media:thumbnail')[0].getAttribute('url')
            user['username'] = response.getElementsByTagName('yt:username')[0].firstChild.data
        except Exception, error:
            # store the error for logging later
            self.errors.append(str(error) + " from main.py:Crawler")

Does anybody have any ideas?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T07:40:15+00:00

from lxml import etree
def parse():
                xmlFileName = '/home/shariq/abc2.xml'
                postsList = []
                tree = etree.parse(xmlFileName)
                for post in tree.xpath("//add/doc"):
                    thispost = {}
                    postxpath = tree.getpath(post)
                    for child in post:
                        fieldName = child.get("name").strip()
                        thispost[fieldName] = child.text
                    postsList.append(thispost)
                return postsList

Above is the function Which convert XML into python dictionary.

the XML I taken is in the form :

<?xml version="1.0"?>
<add>
<doc>
    <field name="country">Serbia</field>
    <field name="date">20110518</field>
    <field name="source">Dan</field>
    <field name="lang">Serbian</field>
    <field name="category">news</field>
    <field name="time">1305744480</field>
    <field name="title">&#268;iste rigole prema Spu&#382;u</field>
    <field name="id">4641119297</field>
  </doc>
  <doc>
    <field name="country">France</field>
    <field name="date">20110518</field>
    <field name="harvest_time">1305744480</field>
    <field name="source">Sport24.com</field>
    <field name="source_rank">3</field>
    <field name="lang">French</field>
    <field name="siteurl">http://www.sport24.com</field>
    <field name="category">news</field>
    <field name="time">1305744480</field>
    <field name="title">La plus belle pour Sharapova</field>
    <field name="id">4641119295</field>
  </doc>
</add>

Once You get the dictionary you problem reduced 1000 times.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m having a bit of an issue. I’m parsing a large xml file using

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply