I have an app which exports data as xml file using xsd scheme.
It creates a lot of small files. 29Gb where the average size of file is 0.3MB or 10KB.
It create about 140000 files. It uses
Enumerable.Range(1, gs.Count)
.AsParallel()
.ForAll
to save independent data to file
Using MS profiler i decreased the time of execution to 30 minutes. It writes up to 20Mb/s at peak.
So the final version of top profiler’s results consinsts of:
System.IO.File.Open(string,valuetype System.IO.FileMode) 30,14 %
System.IDisposable.Dispose() 22,09%
System.Xml.Serialization.XmlSerializer.Serialize(class System.IO.Stream,object) 13,42 %
the code is:
using (Stream streamw = File.Open(fileName, FileMode.Create))
{
formatter.Serialize(streamw, this);
}
where formatter is:
static XmlSerializer formatter = new XmlSerializer(typeof(XItemDesc));
So the questions are:
1) Do the profiler’s results mean that I have reached the maximum of HDD perfomance?
2) Is it intended behavior to have Dispose at the second row?
Well, maybe the performance bottleneck is created by Parallel. Keep in mind that the requests to HDD are not parallel, they are queued.
If your files are in different locations, on different network shares, or on different physical hard drives you get a performance increase from Parallel.
Another thing, if you are using .NET 2.0, and I see that you are using the using() syntactic sugar, you should still call Dispose() and dereference the Stream, because using() is buggy.
If the partition on which the files are located is NTFS, then please consider the performance drawback of NTFS with small files.
NTFS performance and large volumes of files and directories