I have a set of XML files that I want to load into memory in order to process.
I am loading the files into a Collection and it seems that it is a lot faster if I load the files in a single thread rather than using the thread pool.
I would have thought this would have been the other way around.
Why is it the case that use multiple threads to load files into memory is significantly slower than if I just iterate through the file list and load each file one after another on a single thread?
This is with C# .net 3.5
The code:
ICollection<XmlDocument> xmlFilesToProcess = new Collection<XmlDocument>();
foreach (FileInfo fileInfo in fileList)
{
ThreadPool.QueueUserWorkItem(
(o) =>
{
XmlDocument doc = new XmlDocument();
doc.Load((string)o);
lock (xmlFilesToProcess)
{
xmlFilesToProcess.Add(doc);
counter++;
}
}, fileInfo.FullName);
}
Without seeing the code, I would guess it probably has to do with the fact that reading from disk is the slow part of the operation. Since the disk can really only read one file at a time the disk becomes the bottleneck.