I have a Windows Azure application in which all read queries of TableA are executed on single partitions for a range of rowkeys. The Partition Keys that facilitate this storage scheme are actually flattened names of objects in a hierarchy, such that the Partition Key is formatted like {root}_{child1}_{child2}_{leaf}. I can understand how it might be beneficial to divide this one big TableA into many tables by using the root dimension of the Partition Keys in the naming of the Tables (so the Partition Key would become {child1}_{child2}_{leaf}).
What I want to do is provide as rapid access to this data as I can from as many connections at the same time as possible. It would also be incredible if I could figure out what these limits are or should be.
More specific questions about my proposed change:
- Will this make a difference in scalability, i.e. the number of simultaneous data access requests that can be served without perfecting performance dramatically? Served at the same time at all?
- Will this make a difference in average performance? Potential performance?
If every query specifies a partition key, it makes no difference how many tables those partitions are spread across. In other words, the following are equivalent: one table with a thousand partitions versus a thousand tables each with one partition.
The main reason I can think of to consider splitting out into multiple tables is that you can delete an entire table in a single operation/transaction, while you can’t to that with a range of partitions within the same table. That means for things like logs, where you may want to delete the older ones after a while, it’s often better to have different tables for different time ranges.