I face some problem with my query in mongodb(pymogno driver). Here is my problem:

Question

0

Asked: June 17, 20262026-06-17T15:56:39+00:00 2026-06-17T15:56:39+00:00

I face some problem with my query in mongodb(pymogno driver). Here is my problem:

0

I face some problem with my query in mongodb(pymogno driver).

Here is my problem:
I have to insert(update) about 100 million(100000000) documents to mongodb per day.
I gave up on using update the same key field I have to update append, and revised to use bulk insert (update performance is slower than bulk insert).

Here is sketch scheme in my db.

{_id:xxx, F1:1 , F2:"test1", TS": 2011/01}
{_id:xxx, F1:1 , F2:"test2", TS": 2011/02}
{_id:xxx, F1:2 , F2:"test1", TS": 2011/03}
{_id:xxx, F1:3 , F2:"test1", TS": 2011/04}
{_id:xxx, F1:2 , F2:"test1", TS": 2011/05}
.....
(4 billion up or more)

When I query, I just want to retrieve the latest TS group by F1(field1).

I know that “group” aggregation framework can do that, but I have sharding my db and group operation not allow in sharding db.

I also tried to use map-reduce to do that, but it is not providing good enough query performance.

The only query I am using is “$in” operation.

db.test.find({"F1":{"$in":[1,2,3,....]}})

It retrieves all docs in the target array, but i only want to get the latest document per key F1.

{_id:xxx, F1:1 , F2:"test2", TS": 2011/02}
{_id:xxx, F1:2 , F2:"test1", TS": 2011/05}
{_id:xxx, F1:3 , F2:"test2", TS": 2011/03}

How can I get that?

ps.
The target array might contain a million elements that I want to bulk query.

Is there is good way to do that?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T15:56:40+00:00

While there’s no single step solution to this problem as you can’t use the aggregation framework in a shard as you mentioned (and it likely wouldn’t perform well even if it did), you might want to explore a solution like:

Create a new document collection which will be used as your index (but not an actual MongoDB index).
Inside this collection, you will store one document per unique F1 value. The document contains a reference to the most recent Document in your primary collection. You can use a conditional update to create (when necessary) the index document or update it. Use a query to find the document and match only if the timestamp is less than (or equal) to the newest document being inserted for that value. (maybe
You’d then use the “index collection” to fetch the latest document references for each F1 value.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I face some problem with my query in mongodb(pymogno driver). Here is my problem:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply