So I have an interesting use case that I’m stuck trying to find a efficient mongo query for.
To begin, I have 12,000 categories with 100,000 posts. I need to randomly select a 100 pairs of posts, from random categories. The pairs are randomly selected from categories, but each pair must have both posts belonging to the same category.
Users look at each pair to rate and once they finish looking at the 100, they fetch another 100 random posts (preferably not any of the same pairs they’ve already seen).
So the requirements are:
- Fetch 100 pairs of posts randomly from a random set of categories
Optional requirements:
- Not to return the same pairs they’ve already rated
Mongo Collections
- Users
- Categories
- Posts
- CategoryId
- Ratings (embedded collection in posts)
How would I do this in Mongo… should I move some of this data off of mongo to another db if it’s easier?
Yes. Very interesting question. My suggestion is to put a
randomValfield on your post documents. Then you can sort on{CategoryId: 1, randomVal: 1}. The result will be a cursor that groups all the posts byCategoryIdbut randomly within that grouping. If you conceptually think of this as an array, you can pick all the even indexed posts, and pair them with an odd neighbor to get unique random pairs within categories.I think that how to select the random pairs from this list will take some experimentation, but my gut instinct is that the best approach would be to have a separate process that periodically caches a collection of pairs which are sorted by a separate
randomVal2. The user facing queries would just increment through this pairs collection 100 at a time.