I asked a similar question before, but got no helpful response so I will try to make things more clear.
What I am looking for is to run a multithreaded or preferably multiprocessing approach to a certain linux command. If anyone is familiar with Picard, I am wanting to run an earlier version on a bam file and at the same time run a newer version on the same bam file. The idea is to test how much quicker the newer version is and if it gives the same result.
My major problem is that I have no idea how to implement multiprocessing on a Popen command. E.g.
cmd1 = ['nice', 'time', 'java', '-Xmx6G', '-jar', '/comparison/old_picard/MarkDuplicates.jar', 'I=/comparison/old.bam', 'O=/comparison/old_picard/markdups/old.dupsFlagged.bam', 'M=/comparison/old_picard/markdups/old.metrics.txt', 'TMP_DIR=/comparison', 'VALIDATION_STRINGENCY=LENIENT', 'ASSUME_SORTED=true']
cmd2 = ['nice', 'time', 'java', '-Xmx6G', '-jar', '/comparison/new_picard/MarkDuplicates.jar', 'I=/comparison/new.bam', 'O=/comparison/new_picard/markdups/new.dupsFlagged.bam', 'M=/comparison/new_picard/markdups/new.metrics.txt', 'TMP_DIR=/comparison', 'VALIDATION_STRINGENCY=LENIENT', 'ASSUME_SORTED=true']
c1 = subprocess.Popen(cmd1, stdout=subprocess.PIPE)
c2 = subprocess.Popen(cmd2, stdout=subprocess.PIPE)
And then I have a timer function:
def timeit(c):
past = time.time()
results = [c.communicate()]
present = time.time()
total = present - past
results.append(total)
return results
What I WANT to do is this:
p = Process(target=timeit, args=(c1,c2))
p.start()
p.join()
However I get “Popen object not iterable” error. Does anyone have a better idea than what I have now? I don’t want to go off in a completely different direction only to hit another wall. In summary I want to run c1 on one cpu and c2 on the other at the same time, Please help!
Instead of passing the subprocess.Popen (which will run them serially instead of in parallel when it is first defined), pass the command:
ETA: While the above solution is the way to do multiprocessing in general, @Jordan is exactly right that you shouldn’t use this approach to time two versions of software. Why not run them sequentially?