Let’s assume I’m stuck using Python 2.6, and can’t upgrade (even if that would help). I’ve written a program that uses the Queue class. My producer is a simple directory listing. My consumer threads pull a file from the queue, and do stuff with it. If the file has already been processed, I skip it. The processed list is generated before all of the threads are started, so it isn’t empty.
Here’s some pseudo-code.
import Queue, sys, threading
processed = []
def consumer():
while True:
file = dirlist.get(block=True)
if file in processed:
print "Ignoring %s" % file
else:
# do stuff here
dirlist.task_done()
dirlist = Queue.Queue()
for f in os.listdir("/some/dir"):
dirlist.put(f)
max_threads = 8
for i in range(max_threads):
thr = Thread(target=consumer)
thr.start()
dirlist.join()
The strange behavior I’m getting is that if a thread encounters a file that’s already been processed, the thread stalls out and waits until the entire program ends. I’ve done a little bit of testing, and the first 7 threads (assuming 8 is the max) stop, while the 8th thread keeps processing, one file at a time. But, by doing that, I’m losing the entire reason for threading the application.
Am I doing something wrong, or is this the expected behavior of the Queue/threading classes in Python 2.6?
Since this problem only manifests itself when finding a file that’s already been processed, it seems like this is something to do with the
processedlist itself. Have you tried implementing a simple lock? For example:Threading tends to cause the strangest bugs, even if they seem like they “shouldn’t” happen. Using locks on shared variables is the first step to make sure you don’t end up with some kind of race condition that could cause threads to deadlock.
Of course, if what you’re doing under
# do stuff hereis CPU-intensive, then Python will only run code from one thread at a time anyway, due to the Global Interpreter Lock. In that case, you may want to switch to themultiprocessingmodule – it’s very similar tothreading, though you will need to replace shared variables with another solution (see here for details).