Ok, MongoDB experts, please take a look at my collection:
[{
"_id" : "item_0",
"Name" : "Item 0",
"Description" : "Some description for this item...",
"Properties" : {
"a" : 5.0,
"b" : 0.0,
"c" : 6.0,
"d" : 6.0,
"e" : 2.0,
"f" : 0.0,
"g" : 9.0,
"h" : 3.0,
"i" : 4.0,
"j" : 5.0
}
},
{ // 5.000-10.000 more items... }
]
I am using this aggregate to multiply a set of selected properties (in this case a, b, c and d), to then sort them by their product:
{
"aggregate": "item",
"pipeline": [
{
"$project": {
"_id": 1,
"Name": 1,
"s": {
"$multiply": [
"$Properties.a",
"$Properties.b",
"$Properties.c",
"$Properties.d"
]
}
}
},
{
"$sort": {
"s": -1
}
},
{
"$limit": 100
}
]
}
Now this works fine and all, but when the number of items and properties increase the time to execute the aggregate will be increased a lot!
Is there any better way (more efficient) to achieve something like this? The search for the highest product (multiple of a set of properties) must be snappy. If there is a way to index this, with all different combinations of properties and have them cached or something? It’s OK that the indexing takes a while, as long as the querying is fast!
Thanks for any help in this matter, I appreciate it a lot!
Given your requirement for faster searching and efficiency, I think a better approach would be to use Map/Reduce with an output collection (at least until such time as the Aggregation Framework supports using a collection for output).
There are several advantages to using an output collection for your use case.
In particular:
You can use the
mergeoutput option for Map/Reduce to update calculations in your output collection (essentially, this would be your cache).Depending on how often your various properties are updated, I would investigate an incremental approach based on a “last updated” timestamp or some other criteria that allows you to determine when values need to be recalculated. This would allow you to keep the batch sizes more manageable as your collection grows.