In my application users enter a url and I try to open the link and get the title of the page. But I realized that there can be many different kinds of errors, including unicode characters or newlines in titles and AttributeError and IOError. I first tried to catch each error, but now in case of a url fetch error I want to redirect to an error page where the user will enter the title manually. How do I catch all possible errors? This is the code I have now:
title = "title"
try:
soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
title = str(soup.html.head.title.string)
if title == "404 Not Found":
self.redirect("/urlparseerror")
elif title == "403 - Forbidden":
self.redirect("/urlparseerror")
else:
title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")
except UnicodeDecodeError:
self.redirect("/urlparseerror?error=UnicodeDecodeError")
except AttributeError:
self.redirect("/urlparseerror?error=AttributeError")
#https url:
except IOError:
self.redirect("/urlparseerror?error=IOError")
#I tried this else clause to catch any other error
#but it does not work
#this is executed when none of the errors above is true:
#
#else:
# self.redirect("/urlparseerror?error=some-unknown-error-caught-by-else")
UPDATE
As suggested by @Wooble in the comments I added try...except while writing the title to database:
try:
new_item = Main(
....
title = unicode(title, "utf-8"))
new_item.put()
except UnicodeDecodeError:
self.redirect("/urlparseerror?error=UnicodeDecodeError")
This works. Although the out-of-range character —is still in title according to the logging info:
***title: 7.2. re — Regular expression operations — Python v2.7.1 documentation**
Do you know why?
You can use except without specifying any type to catch all exceptions.
From the python docs http://docs.python.org/tutorial/errors.html:
The last except will catch any exception that has not been caught before (i.e. a Exception which is not of IOError or ValueError.)