Assume we have a popular site. We need to implement mail-like messaging between users.
Typical solution is to use 2 tables:
Users (user_id)
Messages (message_id, sender_id (references user_id), receiver_id (references user_id), subject, body ).
This method has 2 significant limitations
- All messages of all users are stored in one table leading to it’s high load and decreasing overall database performance.
- When someone needs to send message to several users simultaneously, the message gets copied (recipients_count) times.
The other solution uses 3 tables:
Users(user_id)
Sent_messages(sent_id, sender_id (references user_id), subject, body)
Received_messages(sent_id, receiver_id (references user_id), subject, body)
subject and body of received_messages are copied from corresponding fields of sent_messages.
This method leads to
- Denormalizing the database by copying information from one table to another
- Users can actually delete sent/received messages without removing them from the receivers/senders.
- Messages take approximately 2 times more space
- Each table is loaded approximately 2 times less.
So here go the questions:
- Which one of considered design is better for high load and scalability? (I think it’s the second one)
- Is there another database design that can handle high load? What is it? What are the limitations?
Thanks!
P.S. I understand that before getting to these scalability issues the site has to be very successful, but I want to know what to do if I need to.
UPDATE
Currently for the first versions I’ll be using design proposed by Daniel Vassallo. But if everything is OK in the future, the design will be changed to the second one. Thanks to Evert for allaying my apprehension about it.
You may want to avoid copying the message body multiple times in the case where a message is sent to multiple recipients. Here is another option which you may want to consider:
This model may be more twitter-like than email-like, but it may come with some advantages.
The rules are that:
These are some of the advantages:
For most applications, if you use an optimistic isolation level with the above model, you should not have performance problems even if you are expecting messages to be exchanged at a rate of a few per second. If on the other hand you’re expecting hundreds or thousands of messages per second, then it may really be the case to consider other options.