I wanted to use a python equivalent to piping some shell commands in perl. Something like the python version of open(PIPE, “command |”).
I go to the subprocess module and try this:
p = subprocess.Popen("zgrep thingiwant largefile", shell=True, stdout=subprocess.PIPE)
This works for reading the output the same way I would in perl, but it doesn’t clean itself up. When I exit the interpreter, I get
grep: writing output: Broken pipe
spewed all over stderr a few million times. I guess I had naively hoped all this would be taken care of for me, but that’s not true. Calling terminate or kill on p doesn’t seem to help. Look at the process table, I see that this kills the /bin/sh process, but leaves the child gzip in place to complain about the broken pipe.
What’s the right way to do this?
The issue is that the
pipeis full. The subprocess stops, waiting for the pipe to empty out, but then your process (the Python interpreter) quits, breaking its end of the pipe (hence the error message).p.wait()will not help you:p.communicate()will not help you:p.stdout.read(num_bytes)will not help you:The moral of the story is, for large output,
subprocess.PIPEwill doom you to certain failure if your program is trying to read the data (it seems to me that you should be able to putp.stdout.read(bytes)into awhile p.returncode is None:loop, but the above warning suggests that this could deadlock).The docs suggest replacing a shell pipe with this:
Notice that
p2is taking its standard input directly fromp1. This should avoid deadlocks, but given the contradictory warnings above, who knows.Anyway, if that last part doesn’t work for you (it should, though), you could try creating a temporary file, writing all data from the first call to that, and then using the temporary file as input to the next process.