Say we are doing processing on a Node. The keys waiting for processing is 2,1,3.
In preprocessing, keys will be sorted to 1,2,3.
And processing sequence will be:
begin processing 1
processing 1 done
begin processing 2
processing 2 done
begin processing 3
processing 3 done
Can I emit some thing with key 2, while processing key 1? Than the thing emitted will be processed when processing key 2.
I think this have no conflict with the concept of parallel processing, because keys on same node will be processed in sequence.
No, because the partitioning step has already happened, so any output from your reduce steps will go to the destination folder, not back into the input folder
partitioned input => reducers => output
You could always run a second mapreduce job with an identity mapper and the same reducer.