I’m building an AppEngine app in Python.
For the sake of discussion, imagine I’m building a Gmail clone. Except with a million short emails per user.
The point is, each user will have a large search index, all to theirself; just like Gmail, each user has a personal “search engine” of their own content.
Now imagine that many of these messages belong to multiple users (e.g. mailing list emails or cc:ing a hundred users). Not all, but some reasonable fraction.
Without prematurely optimizing, what is my best bet to store the data and the indexes?
How about storing a list of User keys in each mail message? That’s assuming that a single message won’t be owned by more than a hundred or so users.
If you want an unlimited number of user * message relationships, you could use another table:
here’s a couple of good articles on modeling relationships like these on GAE:
http://code.google.com/appengine/articles/modeling.html
http://blog.notdot.net/2010/10/Modeling-relationships-in-App-Engine