Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7669841
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T15:41:11+00:00 2026-05-31T15:41:11+00:00

Ideally, I have a Mongo document that looks like below. I want the ability

  • 0

Ideally, I have a Mongo document that looks like below. I want the ability to query for any two of the attributes, and then order by a third.

Document:

{

 "tags" => ["ads", "shopping", "web20", "newspaper", "others..."],
 "reachable_via" => ["email", "twitter", "facebook", "contact_form", "phone"],
 "keywords" => ["keyword1", "keyword2", "keyword3"], 
 "score" => 4 #scalar of 0 - 10,
 "read_in_project_ids => [124, 433,556]

}

Example query, using Mongoid syntax:

Document.any_in(:keywords => ["keyword1", "keyword2"]).where(:tags.in => ["ads", "shopping"], :reachable_via.in => ["email"]).order_by([:presence_score, :desc]).limit(10)

This query works, but they don’t use indexes. In addition, I’ve tried to restructure this thing to make it work three different ways, without any luck.

Right now, I have 3.8 million documents, and this query can take 45-60 seconds to return.

So, how should I restructure to maintain the flexibility of a set of array fields, while gaining indexation benefits?.

FYI, keywords could be hundreds long (and are added by users), but tags and reachable_via elements are fixed (7 options which will grow) and tags is about 20 options which will grow, and are controlled by the application’s code.

Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T15:41:12+00:00Added an answer on May 31, 2026 at 3:41 pm

    The problem is the $in combined with the sort.

    If you can remove one or the other, it would speed up your query significantly.

    Since you can’t have multiple indexes that have array value keys (multikeys, as they call them), you want to pick the most granular array from your query to index. In your example query, that would likely be keywords.

    So, to make your query a bit faster, you would put an index on {keywords:1, score:-1}. This will scan the keywords index, filtering out other query requirements on tags and reachable_via, then sort with score descending. I tested this with collection of 5 million of similar documents to yours, and it used the index on the values that actually did a good job filtering.

    Here’s an example query from the mongo shell (sorry, I’m not a mongoid expert):

    > db.test.find({keywords:{$in:["keyword15", "keyword18"]}, tags:{$in:["shopping","web20"]}, reachable_via:{$in:["email"]}}).sort({score:-1}).limit(10).explain();
    {
    "cursor" : "BtreeCursor keywords_1_score_-1 multi",
    "nscanned" : 1750873,
    "nscannedObjects" : 1750872,
    "n" : 10,
    "scanAndOrder" : true,
    "millis" : 11999,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "keywords" : [
            [
                "keyword15",
                "keyword15"
            ],
            [
                "keyword18",
                "keyword18"
            ]
        ],
        "score" : [
            [
                {
                    "$maxElement" : 1
                },
                {
                    "$minElement" : 1
                }
            ]
        ]
    }
    }
    

    If you can change your query to query only on one keyword, it makes it use the index much more efficiently, getting the top 10 score for a particular keyword in 0ms.

    > db.test.find({keywords:"keyword15", tags:{$in:["shopping","web20"]}, reachable_via:{$in:["email"]}}).sort({score:-1}).limit(10).explain();
    {
    "cursor" : "BtreeCursor keywords_1_score_-1",
    "nscanned" : 14,
    "nscannedObjects" : 14,
    "n" : 10,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "keywords" : [
            [
                "keyword15",
                "keyword15"
            ]
        ],
        "score" : [
            [
                {
                    "$maxElement" : 1
                },
                {
                    "$minElement" : 1
                }
            ]
        ]
    }
    }
    

    Here’s another example. I moved the score out of the sort, and into the query (querying on an exact score, without a limit). This does a good job of speeding up the query, if you’re only looking for the top score, or something like that.

    > db.test.find({keywords:{$in:["keyword15", "keyword18"]}, tags:{$in:["shopping","web20"]}, reachable_via:{$in:["email"]}, score:9}).explain();
    {
    "cursor" : "BtreeCursor keywords_1_score_-1 multi",
    "nscanned" : 175583,
    "nscannedObjects" : 175581,
    "n" : 82345,
    "millis" : 999,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "keywords" : [
            [
                "keyword15",
                "keyword15"
            ],
            [
                "keyword18",
                "keyword18"
            ]
        ],
        "score" : [
            [
                9,
                9
            ]
        ]
    }
    }
    

    Rinse, repeat for other query combinations. Pick the highest granularity array field in the query, index it along with the sorting field. If you can limit the query to not use $in on the indexed array, that’s ideal.

    My test script is located here:
    https://gist.github.com/2091880

    The test script has a few weaknesses, such as the fact that almost every document has a keyword1, so it turns out that querying on keyword1, while it has an index, it’s faster to do a collection scan. Anyway, I was just a little lazy about randomizing the selection of keywords, but in real life that wouldn’t be a problem.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have two HTML tables which would ideally be placed side by side on
I have a stored procedure that ideally should be able to accept a list/table
I have a situation where ideally I want to be able to log-in to
We have several common libs. Ideally we want them all to use the latest
I have a route like following, ideally I would like it to match: domain.com/layout/1-slug-is-the-name-of-the-page
I have a group of borders that make up a small map. Ideally I'd
I am going to develop a content rich application that ideally should have been
I have a client/server application written in C#/.NET 3.5 that I want to do
In my base class I have a generic method (ideally this would be a
I need to have a Win32 application load a hard coded AES-256 key, ideally

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.