My data set consists of documents containing a field with an array of integers. When i’m counting on objects whose field contains elements from some range it seems that index scan performance decreases with higher values indexBounds (but the same amount of values scanned by the range).
Test data:
for (var i = 0; i < 100000; i++) db.foo.insert({tts:(function(){var val = [];for(var j = 0; j < 100; j++) {val[j] = j} return val;})()});
db.foo.ensureIndex({tts:1});
Queries:
> db.foo.find({tts:{$elemMatch:{$gte:10, $lte:10}}}).explain()
{
"cursor" : "BtreeCursor tts_1",
"isMultiKey" : true,
"n" : 100000,
"nscannedObjects" : 100000,
"nscanned" : 100000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 313,
"indexBounds" : {
"tts" : [
[
10,
10
]
]
},
"server" : "localhost:27017"
}
> db.foo.find({tts:{$elemMatch:{$gte:90, $lte:90}}}).explain()
{
"cursor" : "BtreeCursor tts_1",
"isMultiKey" : true,
"n" : 100000,
"nscannedObjects" : 100000,
"nscanned" : 100000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 1286,
"indexBounds" : {
"tts" : [
[
90,
90
]
]
},
"server" : "localhost:27017"
}
In fact I have near 200 values in this field and query gets up to 10 times slower when the requested range have the highest boundaries. (Each value in the field belongs to a unique range, all ranges select the same amount of objects (100000), querying is performed only for subranges of this ranges)
Collection stats:
> db.foo.stats()
{
"ns" : "test.foo",
"count" : 100000,
"size" : 122400128,
"avgObjSize" : 1224.00128,
"storageSize" : 140763136,
"numExtents" : 12,
"nindexes" : 2,
"lastExtentSize" : 40071168,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 254845920,
"indexSizes" : {
"_id_" : 3262224,
"tts_1" : 251583696
},
"ok" : 1
}
Is there a workaround for this problem?
Thanks.
Mongo is able to use the index to determine that there is an element in each of the documents that matches the $lte and $gte conditions. $elemmatch requires that a single element match both conditions so mongo scans each of the documents (and the subarray) to determine whether such an element exists. For the larger values, mongo has to scan 90 elements into each array instead of just the first 10 to find a matching element. Thus a query the matches elements towards the end of a long array will take longer.
Note that if you reverse the array, the behavior is reversed:
It looks like this might be related to https://jira.mongodb.org/browse/SERVER-6002. Using the latest development release might fix the problem at the cost of stability.