I currently have a MongoDB setup with a fairly large database (about 250m documents). At present, I have one main collection that has the majority of the data, which has a single index (time). This results in acceptable query times when only the time is in the where part of the query (the index is used).
The problem is when I need to use a compound key – the time index uses about 2.5GB of memory, and I only have 4GB on the server, so I don’t want to create a compound key index since that will prevent all indexes from fitting in memory and thus slow things down a lot.
So my question is this: can I query first for time, and then query that subset for the other variables?
I should point out that I am using the Ruby driver.
At the moment, my query looks like this (this is very slow):
trade_stop_loss_time = ticks.find_one({
"time" => { "$gt" => trade_time_open, "$lte" => trade_time_close },
"bid" => { "$lte" => stop_loss_price }
}).sort({"time" => 1})
Thanks!
If you simply perform the query you present, the database should be smart enough to do exactly that.
The query you have should basically filter down the candidate set using the
timeindex, then scan the remaining objects for thebidparameter. This should be a lot more efficient than doing the scan on the client.You should definitely run
explain()on your query to find out what it’s doing. If it uses an index (BtreeCursor) and the number of scanned objects is just the number of items in the given time frame, it’s doing fine. I don’t think there’s a better way than that, given your constraints. Doing the same operation on the client will definitely be slower.Of course, a
limitand a small time frame will help to make your query faster, but these might be external factors.mongostatmight also help to find the problem.However, if your documents and/or time spans are large, it might still be better to add the compound index: loading a lot of large documents from disk (since your RAM is already full) will take some time. Paging the index from disk is also slow, but it’s much less data.
A good answer can be given only be experiment.