Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7852713
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T19:29:20+00:00 2026-06-02T19:29:20+00:00

How would I make the below script download multiple links at once instead of

  • 0

How would I make the below script download multiple links at once instead of one at a time with urllib2?

python:

from BeautifulSoup import BeautifulSoup
import lxml.html as html
import urlparse
import os, sys
import urllib2
import re

print ("downloading and parsing Bibles...")
root = html.parse(open('links.html'))
for link in root.findall('//a'):
  url = link.get('href')
  name = urlparse.urlparse(url).path.split('/')[-1]
  dirname = urlparse.urlparse(url).path.split('.')[-1]
  f = urllib2.urlopen(url)
  s = f.read()
  if (os.path.isdir(dirname) == 0): 
    os.mkdir(dirname)
  soup = BeautifulSoup(s)
  articleTag = soup.html.body.article
  converted = str(articleTag)
  full_path = os.path.join(dirname, name)
  open(full_path, 'w').write(converted)
  print(name)
print("DOWNLOADS COMPLETE!")

links.html

<a href="http://www.youversion.com/bible/gen.1.nmv-fas">http://www.youversion.com/bible/gen.1.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.2.nmv-fas">http://www.youversion.com/bible/gen.2.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.3.nmv-fas">http://www.youversion.com/bible/gen.3.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.4.nmv-fas">http://www.youversion.com/bible/gen.4.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.5.nmv-fas">http://www.youversion.com/bible/gen.5.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.6.nmv-fas">http://www.youversion.com/bible/gen.6.nmv-fas</a>
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T19:29:21+00:00Added an answer on June 2, 2026 at 7:29 pm

    Blainer, try threading.

    Here’s a good practical example

    http://www.ibm.com/developerworks/aix/library/au-threadingpython/

    Then reference the python std library as well

    http://docs.python.org/library/threading.html

    If you look on the practical example it actually has a sample of threaded version of urllib2 concurrent downloads. I I went ahead and took you a few step more into the process, you will have to work with the part that says fix this to further parse your html out..

    #!/usr/bin/env python
    
    import Queue
    import threading
    import urllib2
    import time
    import htmllib, formatter
    
    class LinksExtractor(htmllib.HTMLParser):
        # derive new HTML parser
    
        def __init__(self, formatter):        
            # class constructor
            htmllib.HTMLParser.__init__(self, formatter)  
            # base class constructor
            self.links = []        
            # create an empty list for storing hyperlinks
    
        def start_a(self, attrs) :  # override handler of <A ...>...</A> tags
            # process the attributes
            if len(attrs) > 0 :
                for attr in attrs :
                    if attr[0] == "href":         
                        # ignore all non HREF attributes
                        self.links.append(attr[1]) # save the link info in the list
    
        def get_links(self) :     
            # return the list of extracted links
            return self.links
    
    format = formatter.NullFormatter()
    htmlparser = LinksExtractor(format)
    
    data = open("links.html")
    htmlparser.feed(data.read())
    htmlparser.close()
    
    hosts = htmlparser.links
    
    queue = Queue.Queue()
    
    class ThreadUrl(threading.Thread):
        """Threaded Url Grab"""
        def __init__(self, queue):
            threading.Thread.__init__(self)
            self.queue = queue
    
        def run(self):
            while True:
                #grabs host from queue
                host = self.queue.get()
    
                ####################################
                ############FIX THIS PART###########
                #VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV#
    
                url = urllib2.urlopen(host)
                morehtml = url.read() # your own your own with this
    
                #signals to queue job is done
                self.queue.task_done()
    
    start = time.time()
    def main():
        #spawn a pool of threads, and pass them queue instance 
        for i in range(5):
            t = ThreadUrl(queue)
            t.setDaemon(True)
            t.start()
    
            #populate queue with data   
        for host in hosts:
            queue.put(host)
    
        #wait on the queue until everything has been processed     
        queue.join()
    
    main()
    print "Elapsed Time: %s" % (time.time() - start)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would make to make a small WYSIWYG editor similar to the one used
Would it make sense to switch to HipHop instead of XCache? Is HipHop ready
I would like to call an R script from Java. I have done google
I'm trying to make a scrolling calendar script using jquery (similar to the one
I have the below script to import data from a csv file on my
I have programmed a proxy.php script (listed below), which would fetch an image specified
Is there any client-side script that would be able to make changes to a
It would make life so much easier!
i would make an application client server with java to change or modify the
I edited the question so it would make more sense. I have a function

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.