I’m trying to create a directory and copy a file (pdf) inside a Parallel.ForEach.
Below is a simple example:
private static void CreateFolderAndCopyFile(int index)
{
const string sourcePdfPath = "c:\\testdata\\test.pdf";
const string rootPath = "c:\\testdata";
string folderDirName = string.Format("Data{0}", string.Format("{0:00000000}", index));
string folderDirPath = rootPath + @"\" + folderDirName;
Directory.CreateDirectory(folderDirPath);
string desPdfPath = folderDirPath + @"\" + "test.pdf";
File.Copy(sourcePdfPath, desPdfPath, true);
}
The method above creates a new folder and copies the pdf file to a new folder.
It creates this dir tree:
TESTDATA
-Data00000000
-test.pdf
-Data00000001
-test.pdf
....
-Data0000000N
-test.pdf
I tried calling the CreateFolderAndCopyFile method in a Parallel.ForEach loop.
private static void Func<T>(IEnumerable<T> docs)
{
int index = 0;
Parallel.ForEach(docs, doc =>
{
CreateFolderAndCopyFile(index);
index++;
});
}
When I run this code it finishes with the following error:
The process cannot access the file ‘c:\testdata\Data00001102\test.pdf’
because it is being used by another process.
But first it created 1111 new folders and copied test.pdf about 1111 times before I got this error.
What caused this behaviour and how can it be resolved?
EDITED:
Code above was toy sample, sorry for hard coded strings
Conclusion: Parallel method is slow.
Tomorrow I try some methods from How to write super-fast file-streaming code in C#?.
especially: http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp/
You are not synchronizing access to
indexand that means you have a race on it. That’s why you have the error. For illustrative purposes, you can avoid the race and keep this particular design by usingInterlocked.Increment.However, as others suggest, the alternative overload of
ForEachthat provides a loop index is clearly a cleaner solution to this particular problem.But when you get it working you will find that copying files is IO bound rather than processor bound and I predict that the parallel code will be slower than the serial code.