I’m trying to write a shell script that will execute a Hadoop MapReduce job on a pseudo-distributed cluster, but omit all output not preceded by a !. I tried piping the output to awk and filtering it that way, which worked for most of the output, but I’m still getting output from the JobClient to Terminal. Is there a way to prevent this?
My code currently looks like this:
#!/bin/sh
runtimes=$1
for i in {0..$runtimes}
do
cd ~/Documents/hadoop-1.0.3
bin/hadoop dfs -rmr /SC_out | awk "{}"
bin/hadoop jar ../MapReduceTests/SyntaxCounter.jar mrt.SyntaxCounter /WC_in/ /SC_out/ | awk "{}"
bin/hadoop dfs -cat /SC_out/part* | awk "\$0~/!Map/ {print \$0}"
done
EDIT: This is the kind of output I’m looking to suppress:
12/08/15 16:45:17 INFO mapred.JobClient: Running job: job_201208151042_0128
12/08/15 16:45:18 INFO mapred.JobClient: map 0% reduce 0%
12/08/15 16:45:31 INFO mapred.JobClient: map 100% reduce 0%
12/08/15 16:45:43 INFO mapred.JobClient: map 100% reduce 100%
This output is on stderr, not std out, so amend as follows:
Or more simply, invoke the job with verbose parameter set to false: