My scenario is this:
- I’ve got worker which enqueues tasks into a multiprocessing.Queue() if said is empty. This is to ensure execution of tasks follow a certain priority and multiprocessing.Queue() doesn’t do priorities.
- There are a number of workers which pop from the mp.Queue and do some stuff. Sometimes (<0.1%) these fail and die without having the possibility to re-enqueue the task.
- My tasks are locked via a central database and may only run once (hard requirement). For this they have certain states which they can transition from/to.
My current solution: Let all workers answer via another queue which tasks have been completed and introduce a deadline by which a task has to be done. Reset the task and re-enqueue it if a deadline has been reached. This has the problem that the solution is “soft”, i.e. the deadline is arbitrary.
I am searching for the simplest possible solution. Is there a simpler or a more stringent solution to this?
This solution uses three queues to keep track of the work (simulated as
WORK_ID):todo_q: Any work to be done (including that to be redone if the process died in-flight)start_q: Any work that has been started by a processfinish_q: Any work that has been completedUsing this method you should not need a timer. As long as you assign a process identifier and keep track of assignments, check to see whether
Process.is_alive(). If the process died, then add that work back to the todo queue.In the code below, I simulate a worker process dying 25% of the time…
Running this on my laptop…
I tested this with over 10000 work items at a 25% mortality rate.