Given that the CRM 2011 linq provider performs paging automatically behind the scenes.
Is there a way to set an upper limit on the number of records fetched when a linq
query is executed (similar to setting a PagingInfo.Count on a QueryExpression for paging)
I have a scenario where I need approx 20K+ records to be pulled for an update(no I cannot and do not need to filter down the record set further). Ideally I’d prefer to use the Skip & Take operators but since Count is not supported how would you know how many records to skip and when
to stop fetching more records.
Ideally I’d like to use TPL and processes batches of say 3K or 5K records in parallel so that I can get more throughput and don’t have to block. The OrganizationserviceContext is not thread safe from what I know. Are there any good examples that illustrate how to partition the dataset in this case say using Parallel.For or Parallel.ForEach.
How would you partition and would you need to use a different context object for each parition?
Thanks.
UPDATE:
Here is what I came up with:
The idea is to get the total count of records to process and use PLINQ to farm out the processing of each subset of data across tasks using a new OrganizationServiceContext object per task.
static void Main(string[] args)
{
int pagesize = 2000;
// use FetchXML aggregate functions to get total count
// Reference: http://msdn.microsoft.com/en-us/library/gg309565.aspx
int totalcount = GetTotalCount();
int totalPages = (int)Math.Ceiling((double)totalcount / (double)pagesize);
try
{
Parallel.For(0, totalPages, () => new MyOrgserviceContext(),
(pageIndex, state, ctx) =>
{
var items = ctx.myEntitySet.Skip((pageIndex - 1) * pagesize).Take(pagesize);
var itemsArray = items.ToArray();
Console.WriteLine("Page:{0} - Fetched:{1}", pageIndex, itemsArray.Length);
return ctx;
},
ctx => ctx.Dispose()
);
}
catch (AggregateException ex)
{
//handle as needed
}
}
So the way I would do this would be to keep querying the records using skip and take until I run out of records.
Check out my example below, it uses int’s for simplicity, but the approach should still apply to Linq-to-Crm.
So just keep performing your query, skipping previous records, taking the ones you want for that page, then counting at the end to see if you recieved a full page – if you didnt then you have run out of records.
Code
Output: