I might be asking a very basic question but I really can’t figure how to make a simple parallel application in python.
I am running my scripts on a machine with 16 cores and I would like to use all of them efficiently. I have 16 huge files to read and I would like each cpu to read one file and then merge the result.
Here I give a quick example of what I would like to do:
parameter1_glob=[]
parameter2_glob[]
do cpu in arange(0,16):
parameter1,parameter2=loadtxt('file'+str(cpu)+'.dat',unpack=True)
parameter1_glob.append(parameter1)
parameter2_glob.append(parameter2)
I think that the multiprocessing module might help but I couldn’t understand how to apply it to what I want to do.
Do you want to merge line by line? Sometimes coroutines are more interesting for I/O bound applications than classic multitasking. You can chain generators and coroutines for all sort of routing, merging and broadcasting. Blow your mind with this nice presentation by David Beazley.
You can use a coroutine as a sink (untested, please refer to dabeaz examples):