I am evaluating the use of Azure Table Storage for an application I am building, and I would like to get some advice on…
- whether or not this is a good idea for the application, or
- if I should stick with SQL, and
- if I do go with ATS, what would be a good approach to the design of the storage.
The application is a task-management web application, targeted to individual users. It is really a very simple application. It has the following entities…
- Account (each user has an account.)
- Task (users create tasks, obviously.)
- TaskList (users can organize their tasks into lists.)
- Folder (users can organize their lists into folders.)
- Tag (users can assign tags to tasks.)
There are a few features / requirements that we will also be building which I need to account for…
- We eventually will provide features for different accounts to share lists with each other.
- Users need to be able to filter their tasks in a variety of ways. For example…
- Tasks for a specific list
- Tasks for a specific list which are tagged with “A” and “B”
- Tasks that are due tomorrow.
- Tasks that are tagged “A” across all lists.
- Tasks that I have shared.
- Tasks that contain “hello” in the note for the task.
- Etc.
- Our application is AJAX-heavy with updates occurring for very small changes to a task. So, there is a lot of small requests and updates going on. For example…
- Inline editing
- Click to complete
- Change due date
- Etc…
Because of the heavy CRUD work, and the fact that we really have a list of simple entities, it would be feasible to go with ATS. But, I am concerned about the transaction cost for updates, and also whether or not the querying / filtering I described could be supported effectively.
We imagine numbers starting small (~hundreds of accounts, ~hundreds or thousands of tasks per account), but we obviously hope to grow our accounts.
If we do go with ATS, would it be better to have…
- One table per entity (Accounts, Tasks, TaskLists, etc.)
- Sets of tables per customer (JohnDoe_Tasks, JohnDoe_TaskLists, etc.)
- Other thoughts?
I know this is a long post, but if anyone has any thoughts or ideas on the direction, I would greatly appreciate it!
Azure Table Storage is well suited to a task application. As long as you setup your partition keys and row keys well, you can expect fast and consistent performance with a huge number of simultaneous users.
For task sharing, ATS provides optimistic concurrency to support multiple users accessing the same data in parallel. You can use optimistic concurrency to warn users when more than one account is editing the same data at the same time, and prevent them from accidentally overwriting each-other’s changes.
As to the costs, you can estimate your transaction costs based on the number of accounts, and how active you expect those accounts to be. So, if you expect 300 accounts, and each account makes 100 edits a day, you’ll have 30K transactions a day, which (at $.01 per 10K transactions) will cost about $.03 a day, or a little less than $1 a month. Even if this estimate is off by 10X, the transaction cost per month is still less than a hamburger at a decent restaurant.
For the design, the main aspect to think about is how to key your tables. Before designing your application for ATS, I’d recommend reading the ATS white paper, particularly the section on partitioning. One reasonable design for the application would be to use one table per entity type (Accounts, Tasks, etc), then partition by the account name, and use some unique feature of the tasks for the row key. For both key types, be sure to consider the implications on future queries. For example, by grouping entities that are likely to be updated together into the same partition, you can use Entity Group Transactions to update up to 100 entities in a single transaction — this not only increases speed, but saves on transaction costs as well. For another implication of your keys, if users will tend to be looking at a single folder at a time, you could use the row key to store the folder (e.g. rowkey=”folder;unique task id”), and have very efficient queries on a folder at a time.
Overall, ATS will support your task application well, and allow it to scale to a huge number of users. I think the main question is, do you need cloud magnitude of scaling? If you do, ATS is a great solution; if you don’t, you may find that adjusting to a new paradigm costs more time in design and implementation than the benefits you receive.