I have about 1500 files on a share for which I need to collect FileVersionInfo string. So I created a Static method in my Gateway like this:
private static string GetVersionInfo(string filepath)
{
FileVersionInfo verInfo = FileVersionInfo.GetVersionInfo(filepath);
return string.Format("{0}.{1}.{2}.{3}", verInfo.ProductMajorPart, verInfo.ProductMinorPart,
verInfo.ProductBuildPart, verInfo.ProductPrivatePart).Trim();
}
And then used FileAndVersion struct in a PLINQ call with DegreeOfParallelism as this is I/O related
resultList = dllFilesRows.AsParallel().WithDegreeOfParallelism(20)
.Select(r =>
{
var symbolPath = r.Filename;
return new FilenameAndVersion{Filename=symbolPath, Version=GetVersionInfo(symbolPath)};
})
.ToArray();
Later I modified the Struct, FileAndVersion as:
private struct FilenameAndVersion
{
private string _version, _filename;
public string Version { get { return _version; } }
public string Filename { get { return _filename; } }
private void SetVersion()
{
FileVersionInfo verInfo = FileVersionInfo.GetVersionInfo(this.Filename);
this._version = string.Format("{0}.{1}.{2}.{3}", verInfo.ProductMajorPart, verInfo.ProductMinorPart,
verInfo.ProductBuildPart, verInfo.ProductPrivatePart).Trim();
}
public FilenameAndVersion(string filename, string version)
{
this._filename = filename;
this._version = string.Empty;
SetVersion();
}
}
And used it:
resultList = dllFilesRows.AsParallel().WithDegreeOfParallelism(20)
.Select(r =>
{
var symbolPath = r.Filename;
return new FilenameAndVersion(symbolPath, String.Empty);
})
.ToArray();
The question is, is this going to help me in anyway and is a good pattern to use ?
Forgot to mention that the files are on a server that has RAID 10 with SAN attached to it.
Sunit
If all of your files are on the same disk, doing it in parallel isn’t going to help at all. A disk can only read one thing at a time, so you would probably be better of forgetting about parallelism it, ditch the threading overhead and just let it run sequentially.
All you are going to end up with if you run this in parallel is a disk that just thrashes about all over the place and ends up reading slower overall.
If you files are on different physical drives, (or are over a network like FTP), then consider taking a bit more control of the parallelism and dividing it into a single task for each physical disk.
My advice would be to benchmark it before you make any firm commitment to making something parallel.