I have a folder with 500 input files (total size of all files is ~ 500[MB]).
I’d like to write a python script that does the following:
(1) load all of the input files to memory
(2) initializes an empty python list that will later be used … see bullet (4)
(3) start 15 different (independent) processes: each of these uses the same input data [from (1)] — yet uses a different algorithms to processes it, thus generating different results
(4) I’d like all the independent processes [from step (3)] to store their output in the same python list [same list that was initialized in step (2)]
Once all 15 processes have completed their run, I will have one python list that includes the results of all the 15 independent processes.
My question is, is it possible to do the above efficiently in python? if so, can you provide a scheme / sample code that illustrates how to do so?
Note #1: I will be running this on a strong, multi-core server; so the goal here is to use all the processing power while sharing some memory {input data, output list} among all the independent processes.
Note #2: I am working in a Linux environment
ok I just whipped this up using zeromq to demonstrate a single subscriber to multiple publishers. You could probably do the same with queues but you would need to manage them a bit more. zeromq sockets just work which makes it nice for things like this IMO.
oh and to get zmq just