I’d like to know how multiprocessing is done right. Assuming I have a list

Question

0

Asked: May 27, 20262026-05-27T02:15:06+00:00 2026-05-27T02:15:06+00:00

I’d like to know how multiprocessing is done right. Assuming I have a list

0

I’d like to know how multiprocessing is done right. Assuming I have a list [1,2,3,4,5] generated by function f1 which is written to a Queue (left green circle). Now I start two processes pulling from that queue (by executing f2 in the processes). They process the data, say: doubling the value, and write it to the second queue. Now, function f3 reads this data and prints it out.

layout of the data flow

Inside the functions there is a kind of a loop, trying to read from the queue forever. How do I stop this process?

Idea 1

f1 does not only send the list, but also a None object or a custon object, class PipelineTerminator: pass or some such which is just being propagated all the way down. f3 now waits for None to come, when it’s there, it breaks out of the loop. Problem: it’s possible that one of the two f2s reads and propagates the None while the other one it still processing a number. Then the last value is lost.

Idea 2

f3 is f1. So the function f1 generates the data and the pipes, spawns the processes with f2 and feeds all data. After spawning and feeding, it listens on the second pipe, simply counting and processing the received objects. Because it knows how much data fed, it can terminate the processes executing f2. But if the target is to set up a processing pipeline, the different steps should be separable. So f1, f2 and f3 are different elements of a pipeline, and the expensive steps are done in parallel.

Idea 3

pipeline idea 3

Each piece of the pipeline is a function, this function spawns processes as it likes to and is responsible to manage them. It knows, how much data came in and how much data has been returned (with yield maybe). So it’s safe to propagate a None object.

setup child processes 

execute thread one and two and wait until both finished

thread 1:
    while True:
        pull from input queue
        if None: break and set finished_flag
        else: push to queue1 and increment counter1

thread 2:
    while True:
        pull from queue2
        increment counter2
        yield result
        if counter1 == counter2 and finished_flag: break

when both threads finished: kill process pool and return.

(Instead of using threads, maybe one can think of a smarter solution.)

So …

I have implemented a solution following idea 2, feeding and waiting for the results to arrive, but it was not really a pipeline with independent functions plugged together. It worked for the task I had to manage, but was hard to maintain.

I’d like to hear from you now how you implement pipelines (easy in one process with generator functions and so on, but with multiple processes?) and manage them usually.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T02:15:06+00:00

What would be wrong with using idea 1, but with each worker process (f2) putting a custom object with its identifier when it is done? Then f3, would just terminate that worker, until there was no worker process left.

Also, new in Python 3.2 is the concurrent.futures package on the standard library, that should do what you are trying to in the “right way” ™ –
http://docs.python.org/dev/library/concurrent.futures.html

Maybe it is possible to find a backport of concurrent.futures to Python 2.x series.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’d like to know how multiprocessing is done right. Assuming I have a list

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply