I have an interesting problem. I have a working M/R version of this but

Question

0

Asked: May 29, 20262026-05-29T14:50:41+00:00 2026-05-29T14:50:41+00:00

I have an interesting problem. I have a working M/R version of this but

0

I have an interesting problem. I have a working M/R version of this but it’s not really a viable solution in a small-scale environment since it’s too slow and the query needs to be executed real-time.

I would like to iterate over each element in a collection and score it, sort by descending, limit to top 10 and return the results to the applications.

Here is the function I’d like applied to each document in pseudo code.

var score = 0;
foreach(tag in document.Tags) {
    score += someMap[tag];
}
return score;

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T14:50:44+00:00

Since your someMap is changing each time, I don’t see any alternative other than to score all the documents and return the highest-scoring ones. Whatever method you adopt for this type of operation, you’ll have to consider all the documents in the collection, which is going to be slow, and will become more and more costly as the collection you’re scanning grows.

One issue with map reduce is that each mongod instance can only run one concurrent map reduce. This is a limitation of the javascript engine, which is single-threaded. Multiple map reduces will be interleaved, but they cannot run concurrently with one another. This means that if you’re relying on map reduce for “real-time” uses, that is, if your web page has to run a map reduce to render, you’ll eventually hit a limit where page load times become unacceptably slow.

You can work around this by querying all the documents into your application, and doing the scoring, sorting, and limiting in your application code. Queries in MongoDB can run concurrently, unlike map reduce, though of course this means that your application servers will have to do a lot of work.

Finally, if you are willing to wait for MongoDB 2.2 to be released (which should be within a few months), you can use the new aggregation framework in place of map reduce. You’ll have to massage the someMap to generate the correct pipeline steps. Here’s an example of what this might look like if someMap were {"a": 5, "b": 2}:

db.runCommand({aggregate: "foo",
    pipeline: [
        {$unwind: "$tags"},
        {$project: {
            tag1score: {$cond: [{$eq: ["$tags", "a"]}, 5, 0]},
            tag2score: {$cond: [{$eq: ["$tags", "b"]}, 3, 0]}}
        },
        {$project: {score: {$add: ["$tag1score", "$tag2score"]}}},
        {$group: {_id: "$_id", score: {$sum: "$score"}}},
        {$sort: {score: -1}},
        {$limit: 10}
    ]})

This is a little complicated, and bears explaining:

First, we “unwind” the tags array, so that the following steps in the pipeline process documents where “tags” is a scalar — the value of the tag from the array — and all the other document fields (notably _id) are duplicated for each unwound element.
We use a projection operator to convert from tags to named score fields. The $cond/$eq expression for each roughly means (for the tag1score example) “if the value in the document in the ‘tags’ field id equal to ‘a’, then return 5 and assign that value to a new field tag1score, else return 0 and assign that”. This expression would be repeated for each tag/score combination in your someMap. At this point in the pipeline, each document will nave N tagNscore fields, but at most one of them will have a non-zero value.
Next we use another projection operator to create a score field whose value is the sum of the tagNscore fields in the document.
Next we group the documents by their _id, and sum up the value of the score field from the previous step across all documents in each group.
We sort by score, descending (i.e. greatest scores first)
We limit to only the top 10 scores.

I’ll leave it as an exercise to the reader how to convert someMap into the correct set of projections in step 2, and the correct set of fields to add in step 3.

This is essentially the same set of steps that your application code or map reduce would go through, but has the following distinct advantages: instead of map reduce, the aggregation framework is fully implemented in C++ and is faster and more concurrent than map reduce; and unlike querying all the documents to your application, the aggregation framework works with the data on the server side, saving network load. But like the other two approaches, this will still have to consider each document, and can only limit the result set once the score has been calculated for all of them.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have an interesting problem. I have a working M/R version of this but

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply