Is it possible to have multiple inputs with multiple different mappers in Hadoop MapReduce? Each mapper class work on a different set of inputs, but they would all emit key-value pairs consumed by the same reducer. Note that I’m not talking about chaining mappers here, I’m talking about running different mappers in parallel, not sequentially.
Is it possible to have multiple inputs with multiple different mappers in Hadoop MapReduce?
Share
This is called a join.
You want to use the mappers and reducers in the mapred.* packages (older, but still supported). The newer packages (mapreduce.*) only allow for one mapper input. With the mapred packages, you use the MultipleInputs class to define the join: