how to design parallel processing workflow
I have a scenarial case about data analysis.
There are four steps basicly:
-
pick up task either read from a queue or receive a message throught API (web service maybe) to trigger the service
-
submit request to remote service base on the parameters from step 1
-
wait from remote service finished and download
-
perform process on the data that downloaded from step 3
the four step above looks like a sequence workflow.
my question is that how can i scale it out.
every day i might need to perform hundreds to thousands of this task.
if i can do them in parallel, that will help a lot.
e.g run 20 tasks at a time.
so can we config windows workflow foundation to run parallel?
Thanks.
You may want to use pfx (http://www.albahari.com/threading/part5.aspx), then you can control how many threads to make for fetching, and using PLINQ I find helpful.
So, you loop over the list of urls, perhaps reading from a file or database, and then in your select you can then call a function to do the processing.
If you can go into more detail as to whether you want to have the fetching and processing be on different threads, for example, it may be easier to give a more complete answer.
UPDATE:
This is how I would approach this, but I am also using
ConcurrentQueue(http://www.codethinked.com/net-40-and-system_collections_concurrent_concurrentqueue) so I can be putting data into the queue while reading from it.This way each thread can dequeue safely, without worrying about having to lock your collection.
You may want to put the data into another concurrent collection and have that be processed separately, it depends on your application needs.