After map and before reduce, there is a sort phase. In streaming mode, how does hadoop know what you key type is, and sort it.
For example
Input file format
1990 1
1991 4
1992 5
...
The result of map has keys 1990,1991,1992…, how hadoop sorts it? (numberic sort or alphabetical sort)
In the StreamJob, the map output key/value pairs are specified as
Unless,
stream.map.outputproperty is set torawbytesortypedbytes, the map output key/value are set as Text.class in IdentifierResolver#resolve.In the Text.java, Comparator subclass extends WritableComparator and implements compare method which does compare in Lexicographic order (dictionary or alphabetical order).
Check the StreamJob, IdentifierResolver and Text classes.