I have a set of command line tools that I’d like to run in

Question

0

Asked: June 14, 20262026-06-14T00:07:55+00:00 2026-06-14T00:07:55+00:00

I have a set of command line tools that I’d like to run in

0

I have a set of command line tools that I’d like to run in parallel on a series of files. I’ve written a python function to wrap them that looks something like this:

def process_file(fn):
    print os.getpid()
    cmd1 = "echo "+fn
    p = subprocess.Popen(shlex.split(cmd1))

    # after cmd1 finishes
    other_python_function_to_do_something_to_file(fn)

    cmd2 = "echo "+fn
    p = subprocess.Popen(shlex.split(cmd2))
    print "finish"

if __name__=="__main__":
    import multiprocessing
    p = multiprocessing.Pool()
    for fn in files:
        RETURN = p.apply_async(process_file,args=(fn,),kwds={some_kwds})

While this works, it does not seem to be running multiple processes; it seems like it’s just running in serial (I’ve tried using Pool(5) with the same result). What am I missing? Are the calls to Popen “blocking”?

EDIT: Clarified a little. I need cmd1, then some python command, then cmd2, to execute in sequence on each file.

EDIT2: The output from the above has the pattern:

pid
finish
pid
finish
pid
finish

whereas a similar call, using map in place of apply (but without any provision for passing kwds) looks more like

pid
pid
pid
finish
finish
finish

However, the map call sometimes (always?) hangs after apparently succeeding

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T00:07:56+00:00

Are the calls to Popen “blocking”?

No. Just creating a subprocess.Popen returns immediately, giving you an object that you could wait on or otherwise use. If you want to block, that’s simple:

subprocess.check_call(shlex.split(cmd1))

Meanwhile, I’m not sure why you’re putting your args together into a string and then trying to shlex them back to a list. Why not just write the list?

cmd1 = ["echo", fn]
subprocess.check_call(cmd1)

While this works, it does not seem to be running multiple processes; it seems like it’s just running in serial

What makes you think this? Given that each process just kicks off two processes into the background as fast as possible, it’s going to be pretty hard to tell whether they’re running in parallel.

If you want to verify that you’re getting work from multiple processing, you may want to add some prints or logging (and throw something like os.getpid() into the messages).

Meanwhile, it looks like you’re trying to exactly duplicate the effects of multiprocessing.Pool.map_async out of a loop around multiprocessing.Pool.apply_async, except that instead of accumulating the results you’re stashing each one in a variable called RESULT and then throwing it away before you can use it. Why not just use map_async?

Finally, you asked whether multiprocessing is the right tool for the job. Well, you clearly need something asynchronous: check_call(args(file1)) has to block other_python_function_to_do_something_to_file(file1), but at the same time not block check_call(args(file2)).

I would probably have used threading, but really, it doesn’t make much difference. Even if you’re on a platform where process startup is expensive, you’re already paying that cost because the whole point is running N * M bunch of child processes, so another pool of 8 isn’t going to hurt anything. And there’s little risk of either accidentally creating races by sharing data between threads, or accidentally creating code that looks like it shares data between processes that doesn’t, since there’s nothing to share. So, whichever one you like more, go for it.

The other alternative would be to write an event loop. Which I might actually start doing myself for this problem, but I’d regret it, and you shouldn’t do it…

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a set of command line tools that I’d like to run in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply