Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7930091
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T20:18:24+00:00 2026-06-03T20:18:24+00:00

How can I modify my script to skip a URL if the connection times

  • 0

How can I modify my script to skip a URL if the connection times out or is invalid/404?

Python

#!/usr/bin/python

#parser.py: Downloads Bibles and parses all data within <article> tags.

__author__      = "Cody Bouche"
__copyright__   = "Copyright 2012 Digital Bible Society"

from BeautifulSoup import BeautifulSoup
import lxml.html as html
import urlparse
import os, sys
import urllib2
import re

print ("downloading and parsing Bibles...")
root = html.parse(open('links.html'))
for link in root.findall('//a'):
    url = link.get('href')
    name = urlparse.urlparse(url).path.split('/')[-1]
    dirname = urlparse.urlparse(url).path.split('.')[-1]
    f = urllib2.urlopen(url)
    s = f.read()
    if (os.path.isdir(dirname) == 0):
        os.mkdir(dirname)
    soup = BeautifulSoup(s)
    articleTag = soup.html.body.article
    converted = str(articleTag)
    full_path = os.path.join(dirname, name)
    open(full_path, 'wb').write(converted)
    print(name)
print("DOWNLOADS COMPLETE!")
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T20:18:26+00:00Added an answer on June 3, 2026 at 8:18 pm

    To apply the timeout to your request add the timeout variable to your call to urlopen. From the docs:

    The optional timeout parameter specifies a timeout in seconds for
    blocking operations like the connection attempt (if not specified, the
    global default timeout setting will be used). This actually only works
    for HTTP, HTTPS and FTP connections.

    Refer to this guide’s section on how to handle exceptions with urllib2. Actually I found the whole guide very useful.

    The request timeout exception code is 408. Wrapping it up, if you were to handle timeout exceptions you would:

    try:
        response = urlopen(req, 3) # 3 seconds
    except URLError, e:
        if hasattr(e, 'code'):
            if e.code==408:
                print 'Timeout ', e.code
            if e.code==404:
                print 'File Not Found ', e.code
            # etc etc
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to modify a php search script so that it can handle multiple
I'm currently writing a Greasemonkey script for a page, so I can't modify the
I wrote a bash script which can modify php.ini according to my needs. Now
I'm writing a functional test for a legacy Python script so that I can
Can I modify the mysql syntax SET more of a sample script: UPDATE login
How can I modify this script to encase the AM/PM text in a div?
Can anybody confirm or deny that I can modify default styles provided by StyleCop
Is there anyway I can modify this code example #include <stdlib.h> #include <iostream> class
Some C functions can modify the string value without that I pass the address
I'm hoping someone can modify my code below to show me exactly how to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.