So I run the code below, and when I use queue.qsize() after I run it, there are still 450,000 or so items in the queue, implying most lines of the text file were not read. Any idea what is going on here?
from Queue import Queue
from threading import Thread
lines = 660918 #int(str.split(os.popen('wc -l HGDP_FinalReport_Forward.txt').read())[0]) -1
queue = Queue()
File = 'HGDP_FinalReport_Forward.txt'
num_threads =10
short_file = open(File)
class worker(Thread):
def __init__(self,queue):
Thread.__init__(self)
self.queue = queue
def run(self):
while True:
try:
self.queue.get()
i = short_file.readline()
self.queue.task_done() #signal to the queue that the task is done
except:
break
## This is where I should make the call to the threads
def main():
for i in range(num_threads):
worker(queue).start()
queue.join()
for i in range(lines): # put the range of the number of lines in the .txt file
queue.put(i)
main()
It’s hard to know exactly what you’re trying to do here, but if each line can be processed independently,
multiprocessingis a much simpler choice that will take care of all the synchronization for you. An added bonus is that you don’t have to know the number of lines in advance.Basically,
Or, if you’re just trying to get some kind of aggregate result from the lines, you can use
reduceto lower memory usage.