For any key used in a map reduce operation, elements with can key may follow some natural ordering.
Suppose we want to find elements e0 and e1 such that:
- each belong to the same key,
- they follow some ordering
e0<e1 - there is no element
enwheree0<en<e1with respect to our ordering. - some relation between
e0ande1holds.
(How) can that that be done efficiently using map reduce?
A usual database way of solving that is just to get a cursor over our collection ordered by our ordering. Keep track of the last seen element, and the current element and test for the relationship.
The problem with map reduce, is that within a reduce call that reduces e0 and e1 there is no wat to know if an en exists that ruins your assumption that e0 and e1 are successive.
Are there clever ways around this? Or mapreduce frameworks that can guarantee that a set of elements within a reduce call are sequential? Can it be done in mongodb?
MapReduce is a paradigm for parallel programming. Amdahl’s law limits the speedup achieved due to parallelization to 1/(S+P/N), where S and P are the fractions of serial/parallel portions of the code and N is the number of processors. If S=1, then P=0 and speedup is 1, i.e., there is no benefit (in terms of computation time) to using any number N of processors. So if you have a “sequential” (i.e., 100% non-parallel, like computing a non-associative reduction operation) job, MapReduce isn’t ever going to help, ever. Note: maybe your problem is more parallel than you think.