I’m learning how to use the TPL for parellizing an application I have. The application processes ZIP files, exctracting all of the files held within them and importing the contents into a database. There may be several thousand zip files waiting to be processed at a given time.
Am I right in kicking off a separate task for each of these ZIP files or is this an inefficient way to use the TPL?
Thanks.
This seems like a problem better suited for worker threads (separate thread for each file) managed with the ThreadPool rather than the TPL. TPL is great when you can divide and conquer on a single item of data but your zip files are treated individually.
Disc I/O is going to be your bottle neck so I think that you will need to throttle the number of jobs running simultaneously. It’s simple to manage this with worker threads but I’m not sure how much control you have (if nay) over the parallel for, foreach as far as how parallelism goes on at once, which could choke your process and actually slow it down.