I want to implement a user follow system. A user can follow other users. I’m considering two approaches. One is that there are followers and followees in User schema, both of them are arrays of user _id. The other one is that there’s only followers in the schema. Whenever I want to find a user’s followers, I have to search all users’ followers array, that is, db.user.find( { followers: "_id" } );. What the pros and cons of the two approaches? Thanks.
I want to implement a user follow system. A user can follow other users.
Share
What you’re considering is a classic “many-to-many” relationship here. Unlike a RDBMS, where there is a single “correct” normal form for this schema, in MongoDB the correct schema design depends on the way you’ll be using your data, as well as a couple of other factors you haven’t mentioned here.
Note that for this discussion I’m assuming that the “follows” relationship is NOT symmetric — that is, that A can follow B without B having to follow A.
1) There are two basic ways to model this relationship in MongoDB.
You can have a separate collection of “following” documents, like this:
{ user: ObjectID(“x”), following: ObjectID(“y”) }
You’d have one document in this collection for each following relationship. You’d need to have two indexes on this collection, one for “user” and one for “following”.
Note that the second suggestion in your question (having arrays of both “following” and “followed” in the user document) is simply a variation of the first.
2) The correct design depends on a few factors that you haven’t mentioned here.
3) The trade-offs are as follows:
The advantages to the embedded array approach are that the code is simpler, and you can fetch the entire array of followed users in a single document. If you index the ‘following’ array, then the query to find all a users followers will be relatively quick, as long as that index fits entirely in RAM. (This is no different than a relational database.)
The disadvantages to the embedded array approach occur if you are frequently updating the followers, or if you allow an unlimited number of followers / following.
If you allow an unlimited number of followers/following, then you can potentially overflow the maximum size of a MongoDB document. It’s not unheard-of for some people to have 100K followers or more. If this is the case, then you’ll need to go to the separate collection approach.
If you know that there will be frequent updates to the followers, then you’ll probably want to use the separate collection approach as well. The reason is that every time you add a follower, you grow the size of the ‘followers’ array. When it reaches a certain size, it will outgrow the amount of space reserved for it on disk, and MongoDB will have to move the document. This will incur additional write overhead, as all of the indexes for that document will have to be updated as well.
4) If you want to use the embedded array approach, there are a couple of things that you can do to make that more feasable.
First, you can limit the total number of followers that one person can have. Second, when you create a new user, you can create the document with a large number of dummy followers pre-created. (E.g., you populate the ‘followers’ array with a large number of entries that you know don’t refer to any actual user — perhaps ID 0.) That way, when you add a new follower, you replace one of the ID 0 entries with a real entry, and the document size doesn’t grow.
Second, you can limit the number of followers that someone can have, and check for that in the application.
Note that if you use the two-array approach in your document, you will cut the maximum number of followers that one person can have (since a portion of the document will be taken up with the array of users that they are following).
5) As an optimization, you can change the ‘following” documents to be bucketed. So, instead of one document for each following relationship, you might bucket them by user:
6) For more about the ways to model many-to-many, see this presentation:
For more information about the “bucketing” design pattern, see this entry in the MongoDB docs: