Short version:
If I have an index {"category": 1}, and a document {"category": {type: "memory", class: "DDR400"}, how can I do a query such as {"category.type": "memory"} that uses my index?
Long version:
With MongoDB, I want to use an embedded document as a key for an index.
For example, I might have some documents such as this (for a hypothetical product database):
{"category": {"type": "hard-drive", "form_factor": "2.5in", "size": "500GB"}, ...}
{"category": {"type": "hard-drive", "form_factor": "3.5in", ...}, ...}
{"category": {"type": "memory", "class": "DDR400", ...}, ...}
For the above examples, I might want to do queries such as:
{"category.type": "hard-drive"}
{"category.type": "hard-drive", "category.form_factor": "2.5in"}
{"category.type": "memory"}
{"category.type": "memory", "category.class": "DDR400"}
My issues is creating an index. The document at http://www.mongodb.org/display/DOCS/Indexes#Indexes-DocumentsasKeys describes two options:
The first options is to create a compound index, for example { "category.type": 1, "category.class": 1 }. This does not work well for my case, as I might have many different types of sub-categories.
The second option is to use the document as the key: { "category": 1 }. Now a query such as {"category": {"type": "memory", "class": "DDR400"}} would use the index, but {"category": {"type": "memory"}} would return nothing, and {"category.type": "memory"} would not use the index. Is there a way to do a query using this index that would give the same results as {"category.type": "memory"}?
I suspect a query using something like {"category" {"$gt": ..., "$lt": ...} should work, but what should I put in the blank spaces there?
Creating a separate index for
category.type(probably in addition tocategory) seems like the best option.You could use a range query with
$gtand$lt. Those would work on the binary representation of the embedded object, which only works for the first (in storage order) field, and only if that first field is the same in all documents, so it is not very flexible, and easy to break.“memoryX” here serves as a cut-off point: Everything with “memory” will sort before that.
Note that this requires that the “type” field is the first one in the binary representation for all documents that have it. It also ONLY works for the “type” field (no way to query on other fields in the first position, you have to choose one up front), thus giving you practically no advantage over a dedicated “category.type” index (just space savings).
I was experimenting with this idea before, see this thread on the mailing list. It does work, but you have to be careful what you are doing:
Again, consider creating extra indexes on the fields that you need.
That would probably work then. I’d still do two composite indexes
a,banda,x, though. Or maybe justbandx. Given that a document containsborx, you probably already have effectively filtered out the irrelevant documents with regard toa( form_factor = 2.5in already tells you it is a hard disk, class = DDR400 already makes it memory).And after filtering by
a,b, you may not need an index to drill down further onc.By using this tricky query on the binary representation you are making yourself dependent on what could be called an implementation detail. You may be hit by drivers that like to re-order fields, or something like this issue about Mongo itself reshuffling things sometimes.