I am running a script to parse text email files that can be called by MATLAB or run from the command line. The script looks like this:
#!/bin/bash
MYSED=/opt/local/bin/gsed
"$MYSED" -n "/X-FileName/,/*/p" | "$MYSED" "/X-FileName/d" | "$MYSED" "/\-Original Message\-/q"
If I run cat message_file | ./parser.sh in my Terminal window, I get a parsed text file. If I do the same using the system command in MATLAB, I occasionally get the same parsed text followed by the error message
cat: stdout: Broken pipe
When I was using a sed command instead of a cat command, I was getting the same error message. This happens maybe on 1 percent of the files I am parsing, almost always large files where a lot gets deleted after the Original Message line. I do not get the error when I do not include the last pipe, the one deleting everything after ‘Original Message’.
I would like to suppress the error message from cat if possible. Ideally, I would like to understand why running the script through MATLAB gives me an error while running it in Terminal does not? Since it tends to happen on larger files, I am guessing it has to do with a memory limitation, but ‘broken pipe’ is such a vague error message that I can’t be sure. Any hints on either issue would be much appreciated.
I could probably run the script outside of MATLAB and save the processed files, but as some of the files are large I would much rather not duplicate them at this point.
The problem is occurring because of the final gsed command,
"$MYSED" "/\-Original Message\-/q". This (obviously) quits as soon as it sees a match, and if the gsed feeding it tries to write anything after that it’ll receive SIGPIPE and quit, and if there’s enough data the same will happen to the first gsed, and if there’s enough data after that SIGPIPE will be sent to the originalcatcommand, which reports the error. Whether or not the error makes it back tocator not will depend on timing, buffering, the amount of data, the phase of the moon, etc.My first suggestion would be to put the
"$MYSED" "/\-Original Message\-/q"command at the beginning of the pipeline, and have it do the reading from the file (rather than feeding it from cat). This’d mean changing the script to accept the file to read from as an argument:…and then run it with
./parser.sh message_file. If my assumptions about the message file format are right, changing the order of the gsed commands this way shouldn’t cause trouble. Is there any reason the message file needs to be piped to stdin rather than passed as an argument and read directly?