I have a question on designing index in MongoDB.
Say i have a users collection and groups collection.
user {
name : "" ,
age : 19
}
group {
name : ""
members : [],
posts : [ { date : "" , author : "" , topic : "" }, { date : "" ,
author : "" , topic : "" } ......]
}
There can be 1000’s of groups and each group can have millions of
posts. Operations I frequently perform is:
- getting posts based on date (70%)
- updating posts (30%)
So, essentially I need to index on date.
My question is:
Should i create a new posts collection like
posts {
name : "", date : "" , author : "" , topic : ""
}
and create a single-value index on date in posts collection
( db.posts.ensureIndex({posts : 1}) )
OR
Should i include posts inside of group object and create an embedded
index like db.groups.ensureIndex({ posts.date : 1})
Which one is more efficient ? whats the best practice if this needs to
scale to millions of posts ?
Thanks
@Z5h, I think u mis-understood the problem.
The problem was getting posts of a particular group between a date range.
and storing them most effectively.
and after some thinking and research, this’s what i found out.
First, there’s a limit on size of document ( currently 16MB ), and as post schema/posts increase in size, this may stop scaling some day as number of posts increase.and u cannot add an index to search within array of sub documents, as indexes are only across collections.
Second, If posts are stored as embedded sub documents, there would be no way to search within posts for a group in a date range. I have to get entire posts array and do processing on client side which is inefficient. There’s no way to compare array objects based on a field in sub document as of now. refer this
Hence better way is to create a separate posts collection , and have foll data
By this way, i can as well create index on date and get all data for a group in a date range more effectively.