The data looks like this, first field is a number, 3 … 1 …

Question

0

Asked: June 14, 20262026-06-14T04:25:03+00:00 2026-06-14T04:25:03+00:00

The data looks like this, first field is a number, 3 … 1 …

0

The data looks like this, first field is a number,

3 ...
1 ...
2 ...
11 ...

And I want to sort these lines according to the first field numerically instead of alphabetically, which means after sorting it should look like this,

1 ...
2 ...
3 ...
11 ...

But hadoop keeps giving me this,

1 ...
11 ...
2 ...
3 ...

How do correct it?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T04:25:04+00:00

Assuming you are using Hadoop Streaming, you need to use the KeyFieldBasedComparator class.

-D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator should be added to streaming command
You need to provide type of sorting required using mapred.text.key.comparator.options. Some useful ones are -n : numeric sort, -r : reverse sort

EXAMPLE :

Create an identity mapper and reducer with the following code

This is the mapper.py & reducer.py

#!/usr/bin/env python
import sys
for line in sys.stdin:    
    print "%s" % (line.strip())

This is the input.txt

This is the Streaming command

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar 
-D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator 
-D  mapred.text.key.comparator.options=-n 
-input /user/input.txt 
-output /user/output.txt 
-file ~/mapper.py 
-mapper ~/mapper.py 
-file ~/reducer.py 
-reducer ~/reducer.py

And you will get the required output

NOTE :

I have used a simple one key input. If however you have multiple keys and/or partitions, you will have to edit mapred.text.key.comparator.options as needed. Since I do not know your use case , my example is limited to this
Identity mapper is needed since you will need atleast one mapper for a MR job to run.
Identity reducer is needed since shuffle/sort phase will not work if it is a pure map only job.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

The data looks like this, first field is a number, 3 … 1 …

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply