I have a collection that contains user objects, each with a unique ID and some other stuff. This collection could have millions of entries. My question is how expensive would a query be that takes a list of say 300 UIDS, and then checks which of those exist in the collection?
Share
I think there are two parts to this question #1: the query, #2: the performance.
1: The query
This can easily be done using the
$inclause.2: The performance
The thing about the
$inclause is that there is only one logical way to do this from the DB perspective. It’s basically going to do one index search for each item you have.Now if you follow standard protocol and keep all of your index in RAM, then this query is probably going to come in under a second a so. I have some beefy servers with 100s of millions and such a search for 100 “UIDS” comes back in about 500ms.
YMMV. You may get better performance by chunking it out and running multiple simulataneous queries just to ensure that you’re getting multiple threads going on the server.