I’m trying to understand a behavior with map/reduce.
Here’s the map function:
function() {
var klass = this.error_class;
emit('klass', { model : klass, count : 1 });
}
And the reduce function:
function(key, values) {
var results = { count : 0, klass: { foo: 'bar' } };
values.forEach(function(value) {
results.count += value.count;
results.klass[value.model] = 0;
printjson(results);
});
return results;
}
Then I run it:
{
"count" : 85,
"klass" : {
"foo" : "bar",
"Twitter::Error::BadRequest" : 0
}
}
{
"count" : 86,
"klass" : {
"foo" : "bar",
"Twitter::Error::BadRequest" : 0,
"Stream:DirectMessage" : 0
}
}
At this point, everything is good, but here’s come the yielding of the read lock every 100 documents:
{
"count" : 100,
"klass" : {
"foo" : "bar",
"Twitter::Error::BadRequest" : 0,
"Stream:DirectMessage" : 0
}
}
{ "count" : 100, "klass" : { "foo" : "bar", "undefined" : 0 } }
I kept my key foo and my count attribute kept being incremented. The problem is everything else became undefined.
So why am I losing the dynamic keys for my object while my count attribute is still good?
A thing to remember about your reduce function is that the values passed to it are either the output of your map function, or the return value of previous calls to reduce.
This is key – it means mapping / reducing of parts of the data can be farmed off to different machines (eg different shards of a mongo cluster) and then reduce used again to reassemble the data. It also means that mongo doesn’t have to first map every value, keeping all the results in memory and then reduce them all: it can map and reduce in chunks, re-reducing where necessary.
In other words the following must be true:
Your reduce function’s output doesn’t have a
modelproperty so if it gets used in a re-reduce those undefined values will crop up.You either need to have your reduce function return something similar in format to what your map function emits so that you can process the two without distinction(usually the easiest) or else handle re-reduced values differently.