Is there a way around 500 entities / second / partition with ATS (Azure Table Storage)? OK with dirty reads. If in insert is not immediately available for read then OK.
Looking to move some large tables from SQL to ATS.
-
Scale: Because of these tables the size is bumping the 150 GB limit of SQL Azure
-
Insert speed: Inverted index for query speed. Insert order is not
sorted by the table clustered index which causes rapid SQL table
fragmentation. ATS most likely has an insert advantage over SQL. -
Cost: ATS has a lower monthy cost. But ATS has a higher load cost as millions of rows and cannot batch as the order of the load is not by partition.
-
Query speed: A search is almost never on just one partitionKey. A search will have a SQL component and zero or more ATS components. This ATS query is always by partitionKey and returning rowKeys. Raw search on partitionKey is fast the problem is the time to return the entities (rows). A given partitionKey will have on average 1,000 rowKeys which is 2 seconds at 500 entities / second / partition. But there will be some partitionKeys with over 100,000 rowKeys which equates to over 3 minutes. Return 10,000 rows at a time and in SQL and no query is over 10 seconds as with the power of joins don’t have to bring down 100,000 rows to have those rows considered in the where.
-
Is there a was around this select entity speed with ATS? For scale and insert speed would like to go to ATS.
Windows Azure Storage Abstractions and their Scalability Targets
How to get most out of Windows Azure Tables
Designing a Scalable Partitioning Strategy for Windows Azure Table Storage
Turn entity tracking off for query results that are not going to be modified:
context.MergeOption = MergeOption.NoTracking;
One potential workaround is to stripe the data across multiple partitions and/or tables, perform queries across all the (sub)partitions in parallel and merge the results.
For example, for striping across partitions, prepending the partition key with a single digit can multiple the scalability of the partition 10 times.
So a partition key, say ABCDEFGH, could be sub partitioned 0ABCDEFGH to 9ABCDEFGH.
Writes are made to a partition, with the prefix digit generated either randomly or in round robin fashion.
Reads would query across all 10 partitions in parallel and merge the results.
For striping across tables, one of N tables can be written to randomly or in round robin fashion and queried similarly in parallel.