Batch
- read text from file or SQL
- parse the text into words
- load the words into SQL
Today
.NET 4.0
Step 1 is very fast.
Steps 2 and 3 are about the same length (avg 0.1 second) for the same size file.
On step 3 insert using BackGroundWorker and wait for last to complete.
Everything else is on the main thread.
On a big load will do this several million times.
Need step 3 to be serial and in the same order as 1.
This is to keep the SQL table PK index from fracturing.
Tried step 3 in parallel and fracturing the index killed it.
This data is fed sorted by the PK.
Other indexes are dropped at the start of the load then rebuilt at the end of the load.
Where this process is not effective is when the size of text changes.
And the size of the text from file to file does change drastically.
What I would like is to queue 1 and 2 so 3 is kept as busy as possible.
Need step 3 to dequeue the files in order they were enqueued in 1 (even if it waits).
Need a maximum queue size for memory management (like 4-10).
Would like to have step 2 parallel with up to 4 concurrent.
Moving to .NET 4.5.
Asking for general guidance on how to implement this?
I am learning that this is a producer consumer pattern.
If this is not a producer consumer pattern please let me know so I can change the title.
I think TPL Dataflow would be a good way to do this:
For step 2, you would use a
TransformBlockwithMaxDegreeOfParallelismset to 4 andBoundedCapacityalso set to 4, so that its queues are empty when working. It will produce the items in the same order as they came in, you don’t have to do anything special for that. For step 3, use anActionBlock, withBoundedCapacityset to your limit. Then link the two together and start sending items to theTransformBlock, ideally using something likeawait stepTwoBlock.SendAsync(…), to asynchronously wait if the queue is full.In code, it would look something like: