I’m using ASP.NET MVC and Azure Table Storage in the local development fabric. My pagination code is very slow when working with a large resultset:
var PageSize = 25;
var qResult2 = from c in svc.CreateQuery<SampleEntity>(sampleTableName)
where c.PartitionKey == "samplestring"
select c;
TableStorageDataServiceQuery<SampleEntity> tableStorageQuery =
new TableStorageDataServiceQuery<SampleEntity>
(qResult2 as DataServiceQuery<SampleEntity>);
var result = tableStorageQuery.ExecuteAllWithRetries()
.Skip((page - 1) * PageSize)
.Take(PageSize);
var numberOfEntities = tableStorageQuery.ExecuteAllWithRetries().Count
ViewData["TotalPages"] = (int)Math.Ceiling((double) numberOfEntities / PageSize);
ViewData["CurrentPage"] = page;
return View(result);
The ViewData is used by the View to calculate paging links using code from Sanderson’s MVC book. For an Azure Table with 1000+ entities, this is very slow. For starters, “Count” takes quite a long time to calculate the total number of entities. If I’m reading my LINQ book correctly, this is because the query doesn’t implement ICollection. The book is “Pro LINQ” by Joseph Rattz.
Even if I set “numberOfEntities” to the known total (e.g. 1500), the paging is still slow for pages above 10. I’m guessing that .Skip and/or .Take are slow. Also, I call ExecuteAllWithRetries() twice, and that can’t be helping if in fact Azure is queried twice.
What strategy should I follow for paging through large datasets with ASP.NET MVC and Azure?
EDIT: I don’t need to know the exact total number of pages.
SkipandTakearen’t the problem here – they will be executed against theIEnumerable, which will already be in memory and thus very quick.ExecuteAllWithRetriesis likely to be the culprit here – you’re basically retrieving all of the entities in the partition from the remote storage in this call, which will result in a very large payload.Pagination in the manner you’re showing is quite difficult in Table Storage. Here are a few issues:
The only order that’s guaranteed is the
PartitionKey/RowKeyorder, so you need to design yourRowKeyswith this in mind.You can perform the
Takein the query (ie, yourqResult2), so this will reduce the number of entities going over the wire.To perform the
Skip-like functionality, you’ll need to use a comparison operator. So you’ll need to know where you are in the result set and query allRowKeysabove that value (ie, add something likewhere c.RowKey > [lastRowKey]to your query)There’s no way to retrieve a count without keeping track of it yourself (or retrieving the entire table like you’re already doing). Depending on your design, you could store the count along with each entity (ie, use an incrementing value) – but just make sure you keep track of concurrent edit conflicts, etc. If you do keep track of the count with each entity, then you can also perform your
Skipusing this as well. Another option would be to store the count in a single value in another entity (you could use the same table to ensure transactional behaviour). You could actually combine these approaches too (store the count in a single entity, to get the optimistic concurrency, and also store it in each entity so you know where it lies).An alternative would be, if possible, to get rid of the count altogether. You’ll notice a couple of large scalable sites do this – they don’t provide an exact list of how many pages there are, but they might let you go a couple of pages ahead/back. This basically eliminates the need for count – you just need to keep track of the
RowKeysfor the next/prev pages.