I have a collection in my database representing IP addresses pulled from various sources. A sample of which looks like this:
{ "_id" : ObjectId("4e71060444dce16174378b79"), "ip" : "xxx.xxx.xxx.xxx", "sources" : { "Source1" : NumberLong(52), "Source2" : NumberLong(7) } }
Each object will have one or more sources.
My goal is to show the number of entries reported by each source without necessarily knowing the names of every possible source (because new ones can potentially be added at any time). I have attempted to address this with map reduce by simply emitting a 1 for each key in the sources hash of each object, but something is wrong with my syntax, it seems. If I do the following:
var map_s = function(){
for(var source in this.sources) {
emit(source, 1);
}
}
var red_s = function(key, values){
var total = 0;
values.forEach(function(){
total++;
});
return total;
}
var op = db.addresses.mapReduce(map_s, red_s, {out: 'results'});
db.results.find().forEach(printjson);
I get
{ "_id" : "Source1", "value" : 12 }
{ "_id" : "Source2", "value" : 230 }
{ "_id" : "Source3", "value" : 358 }
{ "_id" : "Source4", "value" : 398 }
{ "_id" : "Source5", "value" : 39 }
{ "_id" : "Source6", "value" : 420 }
{ "_id" : "Source7", "value" : 156 }
Which is far too small for the database size. For instance, I get the following in the shell if I count off of a specific source:
> db.addresses.count({"sources.Source4": {$exists: true}});
1260538
Where is my error?
Yes there is a problem in your reduce method, it must be idempotent.
Remember that reduce() may be called many times on intermediary results.
Instead of
You need: