Hi I had a question about linking input and output with sub-processes in python. I am trying to simplify the program by skipping the output of one step by passing it to another subprocess rather than output it to a file. Then open another process to run on that file.
E.g. First process uses SAMTOOLS to output a specific chromosome from a large bam file.
So…
bigfile.bam is read in and outputs chromosome22.bam
The next subprocess uses BEDTOOLS to convert that chromosome22.bam to chromosome22.bed
So…
chromosome22.bam is read in and outputs chromosome22.bed
What I want to do is pass the stdout of the first process into the second so there is no need for the intermediate file.
So far I have this…
for x in 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,'X','Y':
subprocess.call("%s view -bh %s %s > %s/%s/%s.bam" % (samtools,bam,x,bampath,out,x), shell=True)
This makes the chromosome[1-22,X,Y].bam files. But can I avoid this and put another subprocess command in the same loop to convert them to bed files?
The command for bed conversion is:
bedpath/bedtools bamtobed -i [bamfile] > [bedfile]
No need to use python here. Much easier in shell. But essentially, it works the same as in python.
If bedtools can read from stdin, you can e.g. do
Depending on how
bedtoolswas desinged, you might also need to use-i -to have it read fromstdin.If you stick with python, I strongly recommend about learning how to do this
subprocessis more safe to use when you use the array-based syntax and no shell.Make that two subprocess invocations, one for each command. See http://docs.python.org/library/subprocess.html#replacing-shell-pipeline for more details.