I read up about threading in the IBM developer sources and found the following example.
In general I understand what happens here, except for one important thing. The work seems to be done in the run() function. In this example run() only prints a line and signals to the queue, that the job is done.
What if I had to return some processed data? I thought about caching it in a global variable, and to access this one later, but this seems not the right way to go.
Any advice?
Perhaps I should clearify: My intuition tells me to add return processed_data to run() right after self.queue.task_done(), but I can’t figure out where to catch that return, since it is not obvious to me where run() is called.
#!/usr/bin/env python
import Queue
import threading
import urllib2
import time
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]
queue = Queue.Queue()
class ThreadUrl(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
#grabs host from queue
host = self.queue.get()
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print url.read(1024)
#signals to queue job is done
self.queue.task_done()
start = time.time()
def main():
#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
main()
print "Elapsed Time: %s" % (time.time() - start)
You can’t return a value from
run, and in any case there is normally more than one item to process in each thread, so you don’t want to return at all after processing one value (see thewhileloop in each thread).I would either use another queue to return the results:
or store the result in the same queue: