I am yet to ask another Map/Reduce question.
I have a collection “example” which looks like this:
{
"userid" : "somehash",
"channel" : "Channel 1"
}
My Map/Reduce functions look like this:
var map = function () {
emit(this.channel, {user:this.userid, count: 1});
}
var reduce = function (key, values) {
var result = {total:0, unique:0};
var temp = [];
values.forEach(function (value) {
result.total += value.count;
if (temp.indexOf(value.user) == -1) {
temp.push(value.user);
}
});
result.unique += temp.length;
return result;
}
Unfortunately, it gives me some really strange results:
{ "_id" : "Channel 1", "value" : { "total" : NaN, "unique" : 47 } }
{ "_id" : "Channel 2", "value" : { "total" : NaN, "unique" : 12 } }
{ "_id" : "Channel 3", "value" : { "total" : 6, "unique" : 6 } }
And it seems like value.count resolves to null, it also seems like “Unique” isn’t the correct value as well. What I want to do is to count all the values for each channel and also calculate it in such a way that I can see an unique value for each user. Which means, a document in this collection, example, may occur several times. I want to know all times AND unique times.
I followed this guide: http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-ReduceFunction and I don’t know why I get null thrown in my face? Very strange, any good ideas on the subject?
Thanks for the advice and better wisdom.
The reason this is happening is because map/reduce sometimes fires over itself, i.e. reduce is fired over the result of reduce. But result of reduce does not have
countfield. You must always make sure that map emit and reduce result have the same format. Read more about this in documentation.EDIT Here’s a simple demonstration how you can fix this:
Now
result.user.lengthshould give you unique users. Didn’t test it, but it should work.EDIT 2 It should be slow though,
.indexOfis a quite expensive function. You can make it faster by making two map/reduce jobs. First you map/reduce over the collection like this:Now
countover this collection will give you number of unique entries. To get the total number you do second map/reduce over results like this:This should be a lot faster.