I have a WebRole running on a small instance. This WebRole has a method that uploads a large amount of files to BLOB storage. According to the Azure instances specs, a small instance has only 1 core. So when uploading those blobs, will Parallel.Foreach give me any benefits over a regular Foreach ?
Share
You would be much better served by focusing on using the aysnc versions of the blob storage APIs and/or
StreamAPIs so that you are I/O bound rather than CPU bound. Anywhere there is a BeginXXX API you should use it by wrapping it up withTask.Factory.FromAsyncand the using a continuation from there. In your specific case you should leverageCloudBlob.BeginUploadFromStream. How you get the stream initially is just as important so look for async APIs on that end too.The only thing that may hold you back from using a small instance after that is that it’s capped at 100Mbps where as medium is 200Mbps. Then again you can always leverage the elasticity factor and increase role count when you need more processing and scale back again when things calm down.
Here’s an example of how you would call
BeginUploadFromStreamusingFromAsync. Now, as far as coordinating concurrent processing, since you’re now kicking off async tasks you can’t count on Parallel::ForEach to constrain the max concurrency for you. This means you will just have a regular foreach on the original thread with aSemaphoreto limit concurrency. This will provide the equivalent ofMaxDegreeOfParallelism:Now in this sample I am not showing how you should also be getting the source stream asynchronously, but if, for example, you were downloading that stream from a URL someplace else, you would want to kick that off asynchronously as well and chain the starting of the async upload here into a continuation on that.
Believe me, I know this is more code than just doing a simple
Parallel::ForEach, butParallel::ForEachexists to make concurrency for CPU bound tasks easy. When it comes to I/O, using the async APIs is the only way to achieve maximum I/O throughput while minimizing CPU resources.