In C# (.NET 4.0 running under Mono 2.8 on SuSE) I would like to run an external batch command and capture its ouput in binary form. The external tool I use is called ‘samtools’ (samtools.sourceforge.net) and among other things it can return records from an indexed binary file format called BAM.
I use Process.Start to run the external command, and I know that I can capture its output by redirecting Process.StandardOutput. The problem is, that’s a text stream with an encoding, so it doesn’t give me access to the raw bytes of the output. The almost-working solution I found is to access the underlying stream.
Here’s my code:
Process cmdProcess = new Process();
ProcessStartInfo cmdStartInfo = new ProcessStartInfo();
cmdStartInfo.FileName = "samtools";
cmdStartInfo.RedirectStandardError = true;
cmdStartInfo.RedirectStandardOutput = true;
cmdStartInfo.RedirectStandardInput = false;
cmdStartInfo.UseShellExecute = false;
cmdStartInfo.CreateNoWindow = true;
cmdStartInfo.Arguments = "view -u " + BamFileName + " " + chromosome + ":" + start + "-" + end;
cmdProcess.EnableRaisingEvents = true;
cmdProcess.StartInfo = cmdStartInfo;
cmdProcess.Start();
// Prepare to read each alignment (binary)
var br = new BinaryReader(cmdProcess.StandardOutput.BaseStream);
while (!cmdProcess.StandardOutput.EndOfStream)
{
// Consume the initial, undocumented BAM data
br.ReadBytes(23);
// … more parsing follows
But when I run this, the first 23bytes that I read are not the first 23 bytes in the ouput, but rather somewhere several hundred or thousand bytes downstream. I assume that StreamReader does some buffering and so the underlying stream is already advanced say 4K into the output. The underlying stream does not support seeking back to the start.
And I’m stuck here. Does anyone have a working solution for running an external command and capturing its stdout in binary form? The ouput may be very large so I would like to stream it.
Any help appreciated.
By the way, my current workaround is to have samtools return the records in text format, then parse those, but this is pretty slow and I’m hoping to speed things up by using the binary format directly.
Using
StandardOutput.BaseStreamis the correct approach, but you must not use any other property or method ofcmdProcess.StandardOutput. For example, accessingcmdProcess.StandardOutput.EndOfStreamwill cause theStreamReaderforStandardOutputto read part of the stream, removing the data you want to access.Instead, simply read and parse the data from
br(assuming you know how to parse the data, and won’t read past the end of stream, or are willing to catch anEndOfStreamException). Alternatively, if you don’t know how big the data is, useStream.CopyToto copy the entire standard output stream to a new file or memory stream.