I would like to know how to specify mapreduce configurations such as mapred.task.timeout ,

Question

0

Asked: June 13, 20262026-06-13T23:15:41+00:00 2026-06-13T23:15:41+00:00

I would like to know how to specify mapreduce configurations such as mapred.task.timeout ,

0

I would like to know how to specify mapreduce configurations such as mapred.task.timeout , mapred.min.split.size etc. , when running a streaming job using custom jar.

We can use the following way to specify these configurations when we run using external scripting languages like ruby or python:

ruby elastic-mapreduce -j –stream –step-name “mystream” –jobconf mapred.task.timeout=0 –jobconf mapred.min.split.size=52880 –mapper s3://somepath/mapper.rb –reducer s3:somepath/reducer.rb –input s3://somepath/input –output s3://somepath/output

I tried the following ways, but none of them worked:

ruby elastic-mapreduce –jobflow –jar s3://somepath/job.jar –arg s3://somepath/input –arg s3://somepath/output –args -m,mapred.min.split.size=52880 -m,mapred.task.timeout=0
ruby elastic-mapreduce –jobflow –jar s3://somepath/job.jar –arg s3://somepath/input –arg s3://somepath/output –args -jobconf,mapred.min.split.size=52880 -jobconf,mapred.task.timeout=0

I would also like to know how to pass java options to a streaming job using custom jar in EMR.
When running locally on hadoop we can pass it as follows:

bin/hadoop jar job.jar input_path output_path -D< some_java_parameter >=< some_value >

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T23:15:42+00:00

I believe if you want to set these on a per-job basis, then you need to

A) for custom Jars, pass them into your jar as arguments, and process them yourself. I believe this can be automated as follows:

public static void main(String[] args) throws Exception {
  Configuration conf = new Configuration();
  args = new GenericOptionsParser(conf, args).getRemainingArgs();
  //....
}

Then create the job in this manner (haven’t verified if works though):

 > elastic-mapreduce --jar s3://mybucket/mycode.jar \
    --args "-D,mapred.reduce.tasks=0"
    --arg s3://mybucket/input \
    --arg s3://mybucket/output

The GenericOptionsParser should automatically transfer the -D and -jobconf parameters into Hadoop’s job setup. More details: http://hadoop.apache.org/docs/r0.20.0/api/org/apache/hadoop/util/GenericOptionsParser.html

B) for the hadoop streaming jar, you also just pass the configuration change to the command

> elastic-mapreduce --jobflow j-ABABABABA \
   --stream --jobconf mapred.task.timeout=600000 \
   --mapper s3://mybucket/mymapper.sh \
   --reducer s3://mybucket/myreducer.sh \
   --input s3://mybucket/input \
   --output s3://mybucket/output \
   --jobconf mapred.reduce.tasks=0

More details: https://forums.aws.amazon.com/thread.jspa?threadID=43872 and elastic-mapreduce --help

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I would like to know how to specify mapreduce configurations such as mapred.task.timeout ,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply