As we all know, in software dev, we can be asked very ambitious things to do with technology.
Recently I was asked about the quickest possible way to convert 4000 documents from word to pdf. The code/software to do the conversion is in place, and it runs on a dedicated server, so the hardware is also there (this is a recurring task). But from a C# performance perspective, what is the best way to do this?
I keep thinking along the lines of breaking this up into chunks (ie 40 documents) and convert them (i.e. 40 unique documents x 1000 parellel tasks), which run at the same time. Is this the right idea, performance wise? The simplest (and longest) is a serial loop that goes through each doc.
What would you recommend? There are no language constraints so C# 4.0, LINQ etc is all available.
1000 parallel tasks? You want to run 1,000 threads concurrently? You’ll spend more time thread switching than doing actual work. If you have a quad-core machine, you should run four threads, each of which is converting a single document at a time.
Probably the best way to start is to use a simple
Parallel.ForEach, and let the runtime library worry about scheduling the tasks. Something like:You could do the same type of thing with the TPL and tasks:
In either case, you let the runtime library figure out how many tasks to execute in parallel.