Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6613071
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T20:10:29+00:00 2026-05-25T20:10:29+00:00

I am completely new to multiprocessing. I have been reading documentation about multiprocessing module.

  • 0

I am completely new to multiprocessing. I have been reading documentation about multiprocessing module. I read about Pool, Threads, Queues etc. but I am completely lost.

What I want to do with multiprocessing is that, convert my humble http downloader, to work with multiple workers. What I am doing at the moment is, download a page, parse to page to get interesting links. Continue until all interesting links are downloaded. Now, I want to implement this with multiprocessing. But I have no idea at the moment, how to organize this work flow. I had two thoughts about this. Firstly, I thought about having two queues. One queue for links that needs to be downloaded, other for links to be parsed. One worker, downloads the pages, and adds them to queue which is for items that needs to be parsed. And other process parses a page, and adds the links it finds interesting to the other queue. Problems I expect from this approach are; first of all, why download one page at a time and parse a page at a time. Moreover, how do one process know that there are items to be added to queue later, after it exhausted all items from queue.

Another approach I thought about using is that. Have a function, that can be called with an url as an argument. This function downloads the document and starts parsing it for the links. Every time it encounters an interesting link, it instantly creates a new thread running identical function as itself. The problem I have with this approach is, how do I keep track of all the processes spawned all around, how do I know if there is still processes to running. And also, how do I limit maximum number of processes.

So I am completely lost. Can anyone suggest a good strategy, and perhaps show some example codes about how to go with the idea.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T20:10:29+00:00Added an answer on May 25, 2026 at 8:10 pm

    Here is one approach, using multiprocessing. (Many thanks to @Voo, for suggesting many improvements to the code).

    import multiprocessing as mp
    import logging
    import Queue
    import time
    
    logger=mp.log_to_stderr(logging.DEBUG)  # or, 
    # logger=mp.log_to_stderr(logging.WARN) # uncomment this to silence debug and info messages
    
    def worker(url_queue,seen):
        while True:
            url=url_queue.get()
            if url not in seen:
                logger.info('downloading {u}'.format(u=url))
                seen[url]=True
                # Replace this with code to dowload url
                # urllib2.open(...)
                time.sleep(0.5)
                content=url
                logger.debug('parsing {c}'.format(c=content))
                # replace this with code that finds interesting links and
                # puts them in url_queue
                for i in range(3):
                    if content<5:
                        u=2*content+i-1
                        logger.debug('adding {u} to url_queue'.format(u=u))
                        time.sleep(0.5)
                        url_queue.put(u)
            else:
                logger.debug('skipping {u}; seen before'.format(u=url))
            url_queue.task_done()
    
    if __name__=='__main__':
        num_workers=4
        url_queue=mp.JoinableQueue()
        manager=mp.Manager()
        seen=manager.dict()
    
        # prime the url queue with at least one url
        url_queue.put(1)
        downloaders=[mp.Process(target=worker,args=(url_queue,seen))
                     for i in range(num_workers)]
        for p in downloaders:
            p.daemon=True
            p.start()
        url_queue.join()
    
    • A pool of (4) worker processes are created.
    • There is a JoinableQueue, called url_queue.
    • Each worker gets a url from the url_queue, finds new urls and adds
      them to the url_queue.
    • Only after adding new items does it call url_queue.task_done().
    • The main process calls url_queue.join(). This blocks the main
      process until task_done has been called for every task in the
      url_queue.
    • Since the worker processes have the daemon attribute set to True,
      they too end when the main process ends.

    All the components used in this example are also explained in Doug Hellman’s excellent Python Module of the Week tutorial on multiprocessing.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Im completely new to XML/XSL/XSLT, and while i have been digging msdn, 3schools.com and
I am a completely new to JPA+Hibernate+Spring (but I have a fairly much experience
I'm completely new to win32. I have been working on it the last 48
Completely new to using simple XML library in PHP, and have been using the
I'm completely new to AIR but what I'm trying to do feels like it
Im completely new to Java web stuff, but here goes my question: 1) How
I'm completely new to HTML, CSS and Javascript but drawing on previous knowledge of
I'm pretty much completely new to the world of computer programming, so it's been
I am completely new to Java and have an assignment coming up; with the
I'm completely new to loading in libraries like this, but here's where I stand:

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.