I’m using parallel linq, and I’m trying to download many urls concurrently using essentily code like this:
int threads = 10; Dictionary<string, string> results = urls.AsParallel( threads ).ToDictionary( url => url, url => GetPage( url );
Since downloading web pages is Network bound rather than CPU bound, using more threads than my number of processors/cores is very benificial, since most of the time in each thread is spent waiting for the network to catch up. However, judging form the fact that running the above with threads = 2 has the same performance as threads = 10 on my dual core machine, I’m thinking that the treads sent to AsParallel is limited to the number of cores.
Is there any way to override this behavior? Is there a similar library available that doesn’t have this limitation?
(I’ve found such a library for python, but need something that works in .Net)
Do the URLs refer to the same server? If so, it could be that you are hitting the HTTP connection limit instead of the threading limit. There’s an easy way to tell – change your code to:
EDIT: Hmm. I can’t get
ToDictionary()to parallelise at all with a bit of sample code. It works fine forSelect(url => GetPage(url))but notToDictionary. Will search around a bit.EDIT: Okay, I still can’t get
ToDictionaryto parallelise, but you can work around that. Here’s a short but complete program:So, how many threads does this use? 5. Why? Goodness knows. I’ve got 2 processors, so that’s not it – and we’ve specified 10 threads, so that’s not it. It still uses 5 even if I change
GetPageto hammer the CPU.If you only need to use this for one particular task – and you don’t mind slightly smelly code – you might be best off implementing it yourself, to be honest.