I have a rather large collection in mongodb with approx 100 000 documents (not sharded). This is a backend for a web application that basically just allow the user to browse different ways to view the same information in this collection.
For one of the views I try to count the number of occurrences of a field using the aggregation framework. This means aggregating the entire collection. The problem is that this aggregation operation (which is a simple pipeline of group, sort and limit) takes 2 seconds which is too slow for a web application.
So my question is; what is the preferred solution in order to cache the result of this aggregation operation? As far as I have found, it is not possible to “aggregate into” a new collection, or anything like that. At the moment the only solution I have found is to read the entire result into an variable and then inserting this variable into a new collection using insert – but I fear this involves sending a lot of data from the database => to my application => back to the database?
Any suggestions?
Example of pipeline:
res = items.aggregate([
{ "$group": { "_id": { "item_id": "$item_id", "title": "$title", "category": "$category" }, "count": { "$sum": 1 } } },
{ "$sort": { "count": -1 } },
{ "$limit": 5 }
])
The schema is basically those 3 fields + a few more that is really not relevant, i.e:
doc = {
"item_id": 1000,
"title": "this is the item title",
"category": "this is the item category"
}
I have tried index on both item_id and all 3 fields with no success.
Aggregation returns result to one document. The result is limited by 16M. The document is returned to app.
If you want to “aggregate” to a collection – use map-reduce.
Not sure whether you can emit structural values – please try. BTW emitting key field is good.