In Java I have an object that looks like this :
class MyDoc {
ObjectId docId;
Map<String, String> someProps = new HashMap<String,String>();
}
which, when persisted to MongoDB produces the following document :
{
"_id" : ObjectId("4fb538eb5e9e7b17b211d5d3"),
"someProps" : {
"4fda4993eb14ea4a4a149c04" : "PROCESSED",
"4f56a5c4b6f621f092b00525" : "PROCESSED",
"4fd95a2a0baaefd1837fe504" : "TODO"
}
}
I need to query as follow.
DBObject queryObj =
new BasicDBObject("someProps.4fda4993eb14ea4a4a149c04","PROCESSED");
DBObject explain =
getCollection().find(queryObj).hint("props_indx").explain();
which should read find me the MyDoc documents that have a someProps with key “4fda4993eb14ea4a4a149c04” and value “Processed”
I have millions of MyDoc documents stored in the collection so I need efficient indexing on the keys of the someProps embedded object.
The keys of the map are not known in advance (they are dynamically generated, they are not a fixed set of keys) so I cannot create one index per someProps key. (at least I don’t think I can correct me if i’m wrong)
I tried to create the index directly on someProps but querying took ages.
How can Index on someProps Map keys ?
Do I need a different document structure ?
Improtant notes :
1 . There can only be ONE element of someProps with the same key. for example :
{
"_id" : ObjectId("4fb538eb5e9e7b17b211d5d3"),
"someProps" : {
"4fda4993eb14ea4a4a149c04" : "PROCESSED",
"4f56a5c4b6f621f092b00525" : "PROCESSED",
"4f56a5c4b6f621f092b00525" : "TODO"
}
}
would be invalid because 4f56a5c4b6f621f092b00525 cannot be found two times in the Map (hence the use of a Map in the first place)
2 . I also need to efficiently update someProps, only changing the value (ex: changing “4fda4993eb14ea4a4a149c04” : “PROCESSED”, to “4fda4993eb14ea4a4a149c04” : “CANCELLED” )
What are my options ?
Thanks.
I suggest expanding these properties to a documents of their own. So your example:
becomes this
Here
id1is id of your former parent entity (be it application or whatever) andid2is property id.Uniqueness is enforced by properties of
_idfield. Atomic updates are trivial. Indexing is easyThe only disadvantage is some storage overhead.