I’m just learning mapReduce. I have the following map reduce function being called on a collection of users.
function () {
m = function () {
emit(this.city, {num:1, arr:this});
}
r = function (key, arr_values) {
var resultArray = [];
var count = 0;
arr_values.forEach(function (value) {
resultArray.push(value);
count++;
});
return {num:count, arr:resultArray};
}
res = db.AdsOnPage.mapReduce(m, r, {out:"ReducedCollection"});
}
This ends up giving me what I need — “city” as a key, and then an array of the users in that city as the value. But it’s actually giving it to me in an absurd number of nested arrays. I assume this happens as a result of sharding? But how do I rejoin everything? Right now, the results look something like this:
{
"city":"Chicago",
"value" : {
"num" : 2.0,
"arr" : [{
"num" : 2.0,
"arr" : [{
"num" : 1.0,
"arr" : [{
<user doc is here>
}]
}, {
"num" : 1.0,
"arr" : [{
<user doc is here>
}]
}]
}
.......
for many many arrays
Why is this happening? Is there any way to rejoin my results into a coherent single array?
Nothing to do with sharding, this has to with Map / Reduce logic.
The
valuefrom themapfunction needs to have the same shape as the return fromreduce.Remember that the
reducecan be run multiple times. In fact, in the case of sharding, it will be run once for each shard and then again by themongosmaking the request.You’re thinking about what happens when you run
reduce(key, [a,b,c])For Map / Reduce to work, the output must be the same as the following:
reduce(key, [a, reduce(key, [b,c]) )ORreduce(key, [reduce(key, [a,b]), c] )In your case
reduce(key, [b,c])is returning an array so you get the following:reduce(key, [a, reduce(key, [b,c]) )=>reduce(key, [a, [b,c] ])Notice the extra array? That’s why you are getting nesting.
Solving this problem needs to two parts.
valuesis going to be an array, thenemitshould output an array with one item in it.arr_valueswill be an “array of arrays”. You will have to combine them correctly.Hopefully, that points you in the correct direction. For more detailed methods of debugging you may want to look at the page on Troubleshooting M/R.