I’m rewriting a MongoDB map reduce job to use Hadoop instead (using the mongo-hadoop connector), but when I map two datasets to the same collection, it overwrites the values instead of using them
{ reduce : “collectionName” } – If documents exists for a given key in the result set and in the old collection, then a reduce operation (using the specified reduce function) will be performed on the two values and the result will be written to the output collection. If a finalize function was provided, this will be run after the reduce as well.
How is done using mongo-hadoop?
To anyone else looking for this, support for multiple input is coming soon.
The branch with the change is located here. It’s pretty well done, we’re using it in production.