I have data which is organized hierarchically and I’d like to compute aggregations at several levels within a single mongodb map/reduce operation. Is there a way to do this?
Example:
{ street: "A", district: "1", city: "Z", nb_users: 1 }
{ street: "A", district: "1", city: "Z", nb_users: 2 }
{ street: "B", district: "1", city: "Z", nb_users: 3 }
{ street: "B", district: "1", city: "Z", nb_users: 2 }
{ street: "C", district: "1", city: "Z", nb_users: 4 }
{ street: "C", district: "1", city: "Z", nb_users: 3 }
{ street: "A", district: "2", city: "Z", nb_users: 5 }
{ street: "B", district: "2", city: "Z", nb_users: 6 }
{ street: "B", district: "2", city: "Z", nb_users: 3 }
Result:
{ street: "A", district: "1", city: "Z", nb_users_street: 3, nb_users_district: 15, nb_users_city: 29 }
{ street: "B", district: "1", city: "Z", nb_users_street: 5, nb_users_district: 15, nb_users_city: 29 }
{ street: "C", district: "1", city: "Z", nb_users_street: 7, nb_users_district: 15, nb_users_city: 29 }
{ street: "A", district: "2", city: "Z", nb_users_street: 5, nb_users_district: 14, nb_users_city: 29 }
{ street: "B", district: "2", city: "Z", nb_users_street: 9, nb_users_district: 14, nb_users_city: 29 }
Thanks for your help!
No, there is no easy way to do this.
As you are wanting to aggregate by
street,districtandcityyou will need to use all of them as part of the key of your emitted objects, so yourmapfunction would most probably look something like this:As the reduce function combines only records with matching keys, you will only be able to combine records where both the street, district and city are the same – which means you won’t be able to calculate the total for a district or city from these emitted objects as they span multiple streets.
Doing three separate map/reduces into three separate output collections will make the code simpler and easier to understand, and will also remove the redundancy of having
nb_users_districtandnb_users_cityrepeated for every street level row.In fact, the three separate map/reduce functions would be so simple that you should be able to use MongoDB’s built in
groupfunction, which I believe offers some performance benefits over standard map/reduce.