I would like to describe some specifics of my program and get feedback on what the best multithreading model to use would be most applicable. I’ve spent a lot of time now reading on ThreadPool, Threads, Producer/Consumer, etc. and have yet to come to solid conclusions.
I have a list of files (all the same format) but with different contents. I have to perform work on each file. The work consists of reading the file, some processing that takes about 1-2 minutes of straight number crunching, and then writing large output files at the end.
I would like the UI interface to still be responsive after I initiate the work on the specified files.
Some questions:
- What model/mechanisms should I use? Producer/Consumer, WorkPool, etc.
- Should I use a BackgroundWorker in the UI for responsiveness or can I launch the threading from within the Form as long as I leave the UI thread alone to continue responding to user input?
- How could I take results or status of each individual work on each file and report it to the UI in a thread safe way to give user feedback as the work progresses (there can be close to 1000 files to process)
Update:
Great feedback so far, very helpful. I’m adding some more details that are asked below:
-
Output is to multiple independent files. One set of output files per “work item” that then themselves gets read and processed by another process before the “work item” is complete
-
The work items/threads do not share any resources.
-
The work items are processed in part using a unmanaged static library that makes use of boost libraries.
Update based on comments:
I don’t agree with the statement that a ThreadPool will not be able to handle the workload you’re encountering… let’s look at your problem and get more specific:
1. You have almost 1000 files.
2. Each file might take up to 2 minutes of CPU-intensive work to process.
3. You want to have parallel processing to increase throughput.
4. You want to signal when each file is complete and update the UI.
Realistically you don’t want to run 1000 threads, because you’re limited by the number of cores you have… and since it’s CPU intensive work you are likely to max out the CPU load with very few threads (in my programs it’s usually optimal to have 2-4 threads per core).
So you shouldn’t load 1000 work items in the
ThreadPooland expect to see an increase of throughput. You’ll have to create an environment where you’re always running with an optimal number of threads and this requires some engineering.I’ll have to contradict my original statement a little bit and actually recommend a Producer/Consumer design. Check out this question for more details on the pattern.
Here is what the Producer might look like:
Here is your consumer:
A
CountDownLatch:Jicksa’s BlockingQueue:
So what does that leave? Well now all you have to do is start all your threads… you can start them in a
ThreadPool, asBackgroundWorker, or each one as anew Threadand it doesn’t make any difference.You only need to create one
Producerand the optimal number ofConsumersthat will be feasible given the number of cores you have (about 2-4 Consumers per core).The parent thread (NOT your UI thread) should block until all consumer threads are done:
Please not that the above code is illustrative only. You still need to send a termination signal to the
Consumerand theProducerand you need to do it in a thread safe manner.