MongoDB’s explanation of the reduce phase says:
The map/reduce engine may invoke reduce functions iteratively; thus,
these functions must be idempotent.
This is how I always understood reduce to work in a general map reduce environment.
Here you could sum values across N machines by reducing the values on each machine, then sending those outputs to another reducer.
The framework calls the application’s Reduce function once for each
unique key in the sorted order. The Reduce can iterate through the
values that are associated with that key and produce zero or more
outputs.
Here you would need to move all values (with the same key) to the same machine to be summed. Moving data to the function seems to be the opposite of what map reduce is supposed to do.
Is Wikipedia’s description too specific? Or did MongoDB break map-reduce? (Or am I missing somethieng here?)
This is how the original Map Reduce framework was described by Google:
And later:
So there is only one invocation of
Reduce. The problem of moving a lot of small intermediate pairs is addressed by using special combiner function locally:TL;DR
Wikipedia follows original MapReduce design, MongoDB designers taken a slightly different approach.