I’m trying to use python’s multiprocessing module to run a distributed task over a couple of machines, and I’ve been using this blog post as a reference.
However, this post’s task uses a job queue, and puts the results into a results queue, both of which are managed by JobQueueManager (which subclasses SyncManager). This manager has a server which starts, and continually runs until the results queue is filled up, when it calls manager.shutdown().
My issue is that my task doesn’t require a results queue, so I’m trying to figure out how to know when to stop the server. I could have the server run continually with serve_forever, and then manually stop it, or create a dummy queue that fills up in the same way as in the example, and stops the server once it is as big as the original number of jobs.
I’d prefer not to manually stop it, but the second solution seems rather hacky. It seems one common way (without the server) is to call join() on each process, but I don’t know if there’s a way for the manager to find out which process removed each job from the queue.
My fallback plan is a variant of the dummy queue method, but with a shared counter variable that’s incremented as the last step of each process, but I’d like to know if there are any suggestions that use methods from the multiprocessing library, or if this is unreliable.
Thanks
Edit: I didn’t mention that the reason I don’t use a results queue is that I’m storing the results of my processing to a Redis database.
As my update shows, I already use a redis database to store the results of my tasks, so I don’t have to worry about managing a dict between different machines.
The solution that I ended up going with also uses the Redis db. Whenever each process is done, I have it push a string with the process’ info to a list (
r_server.lpush(...)in redis-py). On the server side, instead of using a blockinggetmethod for the result queue, I use Redis’ blocking poprs.blpop()which works the same way.This is pretty much the same as the blog post’s and other suggestions here to make a dummy queue and use
get(), but just with redis so I don’t have the overhead of additional method arguments and registering extra methods with the manager.