I have an executable file which I need to run very often, with different parameters. For this I wrote a small Python (2.7) wrapper, using the multiprocessing module, following the pattern given here.
My code looks like this:
try:
logging.info("starting pool runs")
pool.map(run_nlin, params)
pool.close()
except KeyboardInterrupt:
logging.info("^C pressed")
pool.terminate()
except Exception, e:
logging.info("exception caught: ", e)
pool.terminate()
finally:
time.sleep(5)
pool.join()
logging.info("done")
My worker function is here:
class KeyboardInterruptError(Exception): pass
def run_nlin((path_config, path_log, path_nlin, update_method)):
try:
with open(path_log, "w") as log_:
cmdline = [path_nlin, path_config]
if update_method:
cmdline += [update_method, ]
sp.call(cmdline, stdout=log_, stderr=log_)
except KeyboardInterrupt:
time.sleep(5)
raise KeyboardInterruptError()
except:
raise
path_config is the path to a configuration file for the binary program; in there is e.g. the date to run the program for.
When I start the wrapper, everything looks fine. However, when I press ^C, the wrapper script seems to launch an additional numproc processes from the pool before terminating. As an example, when I start the script for days 1-10, I can see in the ps aux output that two instances of the binary program are running (usually for days 1 and 3). Now, when I press ^C, the wrapper script exits, the binary programs for days 1 and 3 are gone, but there are new binary programs running for days 5 and 7.
So to me it seems as if the Pool launches another numproc processes before finally dying.
Any ideas what’s happening here, and what I can do about it?
On this page, Jesse Noller, author of the multiprocessing module, shows that the correct way to handle
KeyboardInterruptis to have the subprocesses return — not reraise the exception. This allows the main process to terminate the pool.However, as the code below shows, the main process does not reach the
except KeyboardInterruptblock until after all the tasks generated bypool.maphave been run. This is why (I believe) you are seeing extra calls to your worker function,run_nlin, afterCtrl-Chas been pressed.One possible workaround is to have all the worker functions test if a
multiprocessing.Eventhas been set. If the event has been set, then have the worker bail out early, otherwise, go ahead with the long calculation.Running the script yields:
Here Ctrl-C is pressed; each of the workers sets the
terminatingevent. We really only need one to set it, but this works despite the small inefficiency.Now all the other tasks queued by
pool.mapare run:Finally the main process reaches the
except KeyboardInterruptblock.