I’m new to subprocess module of python, currently my implementation is not multi processed.
import subprocess,shlex
def forcedParsing(fname):
cmd = 'strings "%s"' % (fname)
#print cmd
args= shlex.split(cmd)
try:
sp = subprocess.Popen( args, shell = False, stdout = subprocess.PIPE, stderr = subprocess.PIPE )
out, err = sp.communicate()
except OSError:
print "Error no %s Message %s" % (OSError.errno,OSError.message)
pass
if sp.returncode== 0:
#print "Processed %s" %fname
return out
res=[]
for f in file_list: res.append(forcedParsing(f))
my questions:
-
Is sp.communicate a good way to go? should I use poll?
if I use poll I need a sperate process which monitors if process finished right?
-
should I fork at the
forloop?
About question 2: forking at the for loop will mostly speed things up if the script’s supposed to run on a system with multiple cores/processors. It will consume more memory, though, and will stress IO harder. There will be a sweet spot somewhere that depends on the number of files in
file_list, but only benchmarking on a realistic target system can tell you where it is. If you find that number, you could add anif len(file_list) > <your number>:with optionalfork()‘ing [Edit: rather, as @tokland say’s viamultiprocessingif it’s available on your Python version (2.6+)] that chooses the most efficient strategy on a per-job basis.Read about Python profiling here: http://docs.python.org/library/profile.html
If you’re on Linux, you can also run
time: http://linuxmanpages.com/man1/time.1.php