so when we use Java for writing map/reduce program, the map collects the data

Question

0

Asked: May 26, 20262026-05-26T00:12:49+00:00 2026-05-26T00:12:49+00:00

so when we use Java for writing map/reduce program, the map collects the data

0

so when we use Java for writing map/reduce program, the map collects the data and reducer receives the list of values per key, like

Map(k, v) -> k1, v1  
    then shuffle and sort happens  
    then reducer gets it  

reduce(k1, List<values>)

to work on. but is it possible to do the same with python using streaming? I used this as reference and seems like reducer gets data per line as supplied on command-line

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T00:12:50+00:00

In Hadoop Streaming, the mapper writes key-value pairs to sys.stdout. Hadoop does the shuffle and sort and directs the results to the mapper in sys.stdin. How you actually handle the map and the reduce is entirely up to you, so long as you follow that model (map to stdout, reduce from stdin). This is why it can be tested independently of Hadoop via cat data | map | sort | reduce on the command line.

The input to the reducer is the same key-value pairs that were mapped, but comes in sorted. You can iterate through the results and accumulate totals as the example demonstrates, or you can take it further and pass the input to itertools.groupby() and that will give you the equivalent to the k1, List<values> input that you are used to, and which work well the the reduce() builtin.

The point being that it’s up to you to implement the reduce.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

so when we use Java for writing map/reduce program, the map collects the data

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply