I’ve got a MongoDB collection that holds about 100M documents. The documents basically look

Question

0

Editorial Team

Asked: May 31, 20262026-05-31T12:18:28+00:00 2026-05-31T12:18:28+00:00

I’ve got a MongoDB collection that holds about 100M documents. The documents basically look

0

I’ve got a MongoDB collection that holds about 100M documents.

The documents basically look like this:

_id             : ObjectId("asd1234567890")
_reference_1_id : ObjectId("fgh4567890123")
_reference_2_id : ObjectId("jkl7890123456")
name            : "Test1"
id              : "4815162342"
created_time    : Date( 1331882436000 )
_contexts       : ["context1", "context2"]
...

There are some indexes set, here’s the output of db.mycoll.getIndexes();

[
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "ns" : "mydb.mycoll",
    "name" : "_id_"
},
{
    "v" : 1,
    "key" : {
        "_reference_1_id" : 1,
        "_reference_2_id" : 1,
        "id" : 1
    },
    "unique" : true,
    "ns" : "mydb.mycoll",
    "name" : "_reference_1_id_1__reference_2_id_1_id_1"
},
{
    "v" : 1,
    "key" : {
        "_reference_1_id" : 1,
        "_reference_2_id" : 1,
        "_contexts" : 1,
        "created_time" : 1
    },
    "ns" : "mydb.mycoll",
    "name" : "_reference_1_id_1__reference_2_id_1__contexts_1_created_time_1"
}
]

When I execute a query like

db.mycoll.find({"_reference_2_id" : ObjectId("jkl7890123456")})

it takes over an hour (!) until it’s finished, no matter if there are results or not.
Any ideas?

Update:
Here’s what the output of

db.mycoll.find({"_reference_2_id" : ObjectId("jkl7890123456")}).explain();

looks like:

{
"cursor" : "BasicCursor",
"nscanned" : 99209163,
"nscannedObjects" : 99209163,
"n" : 5007,
"millis" : 5705175,
"nYields" : 17389,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T12:18:29+00:00

You don’t have any index that mongo will automatically use for that, so it’s doing a full table scan.

As mentioned in the docs

If the first key [of the index] is not present in the query, the index will only be used if hinted explicitly.

Why

If you have an index on a,b – and you search by a alone – an index will automatically be used. This is because it’s the start of the index (which is fast to do), the db can just ignore the rest of the index value.

An index on a,b is inefficient when searching by b alone simply because it doesn’t give the possibility to use the index searching with “starts with thisfixedstring”.

So, either:

Include _reference_1_id in the query (probably irrelevant)
OR add an index on _reference_2_id (if you query by the field often)
OR use a hint

Hint

Probably your lowest-cost option right now.

Add a query hint to force using your _reference_1_id_1__reference_2_id_1_id_1 index. Which is likely to be a lot faster than a full table scan, but still a lot slower than an index which starts with the field you are using in the query.

i.e.

db.mycoll
    .find({"_reference_2_id" : ObjectId("jkl7890123456")})
    .hint("_reference_1_id_1__reference_2_id_1_id_1");

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve got a MongoDB collection that holds about 100M documents. The documents basically look

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply