In my application, we need to develop a FRIENDS relationship table in datastore. And of course a quick solution I’ve thought of would be this:
user = db.ReferenceProperty(User, required=True, collection_name='user')
friend = db.ReferenceProperty(User, required=True, collection_name='friends')
But, what would happen when the friend list grows to a huge number, say few thousands or more ? Will this be too inefficient ?
Performance is always a priority to us. This is very much needed, as we would have few more to follow this similar relationship design.
Please give advice on the best approach to design for FRIENDS relationship table using datastore within App Engine Python environment.
EDIT
Other than FRIENDS relationship, FOLLOWER relationship will be created as well. And I believe it will be very often enough all these relationship to be queries most of the time, for the reason social media oriented of my application tend to be.
For example, If I follow some users, I will get update as news feed on what they will be doing etc. And the activities will be increased over time. As for how many users, I can’t answer yet as we haven’t go live. But I foresee to have millions of users as we go on.
Hopefully, this would help for more specific advice or is there alternative to this approach ?
Your FRIENDS model (and presumably also your FOLLOWERS model) should scale well. The tricky part in your system is actually aggregating the content from all of a user’s friends and followees.
Querying for a list of a user’s is O(N), where N is the number of friends, due to the table you’ve described in your post. However, each of those queries requires another O(N) operation to retrieve content that the friend has shared. This leads to O(N^2) each time a user wants to see recent content. This particular query is bad for two reasons:
INkeyword you’d need to use in order to grab the list of shared items won’t work for more than 30 friends.For this particular problem, I’d recommend creating another table that links each user to each piece of shared content. Something like this:
When it comes time to render the stream of updates, you need an O(N) query (N is the number of items you want to display) to look up all the items shared with the user (ordered by date descending). Keep N small to keep this as fast as possible.
Sharing an item requires creating O(N)
SharedItemswhere N is the number of friends and followers the poster has. If this number is too large to handle in a single request, shard it out to a task queue or backend.