Good day… I am a bit confused; what is the difference between a reduce

Question

0

Asked: June 2, 20262026-06-02T02:05:36+00:00 2026-06-02T02:05:36+00:00

Good day… I am a bit confused; what is the difference between a reduce

0

Good day…
I am a bit confused; what is the difference between a reduce task and a reduce job?
here is my case; I have read that reduce does not start until all mapping is finished…
but in the hadoop output I see otherwise:

12/02/11 10:58:50 INFO mapred.JobClient: map 60% reduce 16%
12/02/11 10:58:54 INFO mapred.JobClient: map 60% reduce 20%
12/02/11 10:58:55 INFO mapred.JobClient: map 65% reduce 20%

the reduce is 16% whilst the map is still 60%…
What is really happening here?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T02:05:37+00:00

There are three phases of the “reduce phase”: shuffle, sort, reduce. The shuffle copies the data and the sort groups the keys together. The reduce is the actual reduce function that you wrote.

The way the percentages work is shuffle is 33%, sort is 33%, and reduce is 33%. What you are seeing is “about 16%/33% (i.e., 48%) of the data has been copied over to the reducers”. The final 33% of “reduce” can’t start until all the mappers are done.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Good day… I am a bit confused; what is the difference between a reduce

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply