This question is largely a sanity check. I’ve organized a DB by a collection of stories and a collection of users. Each story has an array of ‘voters’ who have voted on that object. Each user also has an array of ‘friends’. What I want to do is search for only stories that my friends have voted on, but additionally to be able to sort these by the number of friends voting on that item.
My initial thinking is this: To index the field of voters in the Story objects. Then do a map reduce query for just stories on this indexed voter field using the array of ‘friends’ from the user document, with a grouping function to count the number of times each story shows up? Not sure if that is correct.. I’m also not sure if this would scale.. Thoughts and suggestions appreciated.
I think you should use a background worker that runs your M/R query periodically and stores the results in a collection which you can the query very easily, e.g
This is trivial to query, but not very flexible. A more flexible structure, avoiding an embedded list:
The latter can be used to sort by the number of total votes as well, for example.
M/R used to be ‘the big hammer’, which should not be run in real-time from a web frontend or anything. There were plans to improve this, but I don’t know the current state of that, so I’d play it safe. I also believe that this M/R job won’t be very fast if your collections grow big, expect this to run in the order of dozens of seconds if not minutes, rather than milliseconds.