In my application users enter a url and I try to open the link

Question

0

Asked: May 20, 20262026-05-20T11:04:28+00:00 2026-05-20T11:04:28+00:00

In my application users enter a url and I try to open the link

0

In my application users enter a url and I try to open the link and get the title of the page. But I realized that there can be many different kinds of errors, including unicode characters or newlines in titles and AttributeError and IOError. I first tried to catch each error, but now in case of a url fetch error I want to redirect to an error page where the user will enter the title manually. How do I catch all possible errors? This is the code I have now:

    title = "title"

    try:

        soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
        title = str(soup.html.head.title.string)

        if title == "404 Not Found":
            self.redirect("/urlparseerror")
        elif title == "403 - Forbidden":
            self.redirect("/urlparseerror")     
        else:
            title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")

    except UnicodeDecodeError:    
        self.redirect("/urlparseerror?error=UnicodeDecodeError")

    except AttributeError:        
        self.redirect("/urlparseerror?error=AttributeError")

    #https url:    
    except IOError:        
        self.redirect("/urlparseerror?error=IOError")


    #I tried this else clause to catch any other error
    #but it does not work
    #this is executed when none of the errors above is true:
    #
    #else:
    #    self.redirect("/urlparseerror?error=some-unknown-error-caught-by-else")

UPDATE

As suggested by @Wooble in the comments I added try...except while writing the title to database:

        try:
            new_item = Main(
                        ....
                        title = unicode(title, "utf-8"))

            new_item.put()

        except UnicodeDecodeError:    

            self.redirect("/urlparseerror?error=UnicodeDecodeError")

This works. Although the out-of-range character â€”is still in title according to the logging info:

***title: 7.2. re â€” Regular expression operations &mdash; Python v2.7.1 documentation**

Do you know why?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T11:04:29+00:00

You can use except without specifying any type to catch all exceptions.

From the python docs http://docs.python.org/tutorial/errors.html:

import sys

try:
    f = open('myfile.txt')
    s = f.readline()
    i = int(s.strip())
except IOError as (errno, strerror):
    print "I/O error({0}): {1}".format(errno, strerror)
except ValueError:
    print "Could not convert data to an integer."
except:
    print "Unexpected error:", sys.exc_info()[0]
    raise

The last except will catch any exception that has not been caught before (i.e. a Exception which is not of IOError or ValueError.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In my application users enter a url and I try to open the link

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply