I’m experimenting with the new AF to migrate away from Map/reduce. I have millions

Question

0

Asked: June 12, 20262026-06-12T03:45:10+00:00 2026-06-12T03:45:10+00:00

I’m experimenting with the new AF to migrate away from Map/reduce. I have millions

0

I’m experimenting with the new AF to migrate away from Map/reduce. I have millions of objects like this:

{
 _id: ObjectID,
 owner: 1,
 tags: [
   {text: "dog", score: 5}, 
   {text: "cat", score: 3}, 
   {text: "hamster", score:1}]
}

{
 _id: ObjectID,
 owner: 2,
 tags: [
   {text: "cat", score: 8}, 
   {text: "fish", score: 4}]
}

and I want to do a report with count of all matches of “cat” and “fish” where the owner is X.

So far I have my pipeline assuming input tags [“cat”, “fish”] looking like:

{
  $match: { owner: X, $in: {"tags.text": ["cat", "fish"]}}
}, {
  $project: {text: "$tags.text"},
}, {
  $unwind: "$text",
}, {
  $match: {"text": {$in: {"tags": ["cat", "fish"]}}
}, {
  $group: {"_id": "$text", "total: {"$sum": 1}}
}

The first $match is to just narrow down to a subset of all these million objects – since I have an index on owner and “tags.txt”.

This pipeline functions fine for small numbers of tags, but I need to be able to pass in 100-1000 “tags” and get a quick result. It seems to be that it must be inefficient to project out and unwind all the tags, only to filter way 90% in the next match step.

Is there a more efficient way? Maybe reorder the pipeline steps?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T03:45:11+00:00

This looks good to me except for some typos and the usage of the $in operator in each $match pipeline operation probably should read:

{
  $match: {owner: X, "tags.text": {$in: ["cat", "fish"]}}
}, {
  $project: {text: "$tags.text"}
}, {
  $unwind: "$text"
}, {
  $match: {"text": {$in: ["cat", "fish"]}}
}, {
  $group: {"_id": "$text", "total": {"$sum": 1}}
}

In essence, you want to use $match as early in the pipeline as possible to limit the number of documents being processed later in the pipeline. The match on owner and specific tags accomplishes this. You also need to make sure your $match, the equivalent of a .find(), uses the appropriate indexes.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m experimenting with the new AF to migrate away from Map/reduce. I have millions

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply