I am building a server back-end for a mobile social network using Windows Azure.
I have these 3 entities:
- Users – Stored in SQL Azure
- Threads (sort of relations between 2 users which are then able to send messages to each other) – Stored in SQL Azure
- Messages – Stored in Azure Tables
As I store Messages in Azure Tables partitioned by Thread ID I expect good performance when chatting (sending/reading Messages to/from Threads).
But I also need to be able to provide users with a list of the most recent Threads (recent = contains the most recent message). In other words I need to order Threads by the last message date when displaying.
Scanning many different table partitions and looking for the messages will obviously be performance killer, so I need to somehow denormalize data to other table partitions to be able to fetch the most recent threads efficiently.
What based on your experience is the best strategy?
Edit: after further thinking, here is a better suggestion (I think):
Have one Message ATS table. This table will house two types of messages: messsage sent and message received.
Each time a user sends a message, store it in the table as “Sent” and then as “Received” (or whatever you want to call those types).
Partition all of the messages in the Message table by the following:
(UserId) – PartitionKey, (long.Max – Timestamp.Ticks) – RowKey
As extra properties you can store ThreadId, Sent/Received differentiation, etc.
If you want to guarantee that your message is inserted twice w/o problems, use Queue’s and a Worker role.
This scheme partitions everything by the user. You would be able to display all of the messages to/from that user within a time range and always descending.