I am trying to loop through all files and folders and perform an action on all files that have a certain extension. This method works fine, but I would like to make it multithreaded because when done over tens of thousands of files, it is really slow and I would imaging using multithreading would speed things up. I am just unsure about how to use threading in this case.
doStuff reads properties (date modified, etc. from the files and inserts them into a sqlite database. I am starting a transaction before the scan method is called so that is optimized as much as it can be.
Answers that provide the theory on how to do it are just as good as full working code answers.
private static string[] validTypes = { ".x", ".y", ".z", ".etc" };
public static void scan(string rootDirectory)
{
try
{
foreach (string dir in Directory.GetDirectories(rootDirectory))
{
if (dir.ToLower().IndexOf("$recycle.bin") == -1)
scan(dir);
}
foreach (string file in Directory.GetFiles(rootDirectory))
{
if (!((IList<string>)validTypes).Contains(Path.GetExtension(file)))
{
continue;
}
doStuff(file);
}
}
catch (Exception)
{
}
}
Assuming that
doStuffis thread-safe, and that you don’t need to wait for the entire scan to finish, you can call bothdoStuffandscanon the ThreadPool, like this:You need to make a separate local variable because the anonymous method would have capture the
filevariable itself, and would see changes to it throughout the loop. (In other words, if the ThreadPool only executed the task after the loop continued to the next file, it would process the wrong file)However, reading your comment, the main issue here is disk IO, so I suspect that multithreading will not help much.
Note that
Directory.GetFileswill perform slowly for directories with large numbers of files. (Since it needs to allocate an array to hold of the filenames)If you’re using .Net 4.0, you can make it faster by calling the
EnumerateFilesmethod instead, which uses an iterator to return aIEnumerable<string>that enumerates the directory as you run your loop.You can also avoid the recursive
scancalls with either method by passing theSearchOptionparameter, like this:This will recursively scan all subdirectories, so you’ll only need a single
foreachloop.Note that this will exacerbate the performance issues with
GetFiles, so you may want to avoid this pre-.Net 4.0.