I read a table with more than 1 million records from a database. It takes 3 minutes until I have a populated object in memory. I wanted to optimize this process, and serialized this object to a file using binary BinaryFormatter. It created a file with 1/2 GB size. After I deserialized this file back to memory object. This took 11 minutes!
The question: why it’s much faster to read all these data from a database than from a file? Is it possible to optimize deserializing process somehow?
Database is on the same machine I did this test. None of other processes took CPU time at this time. CPU has 4 cores and there are 40 GB memory.
Edit: Code for deserializing:
using (FileStream fs = new FileStream(filename, FileMode.Open))
{
var bf = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
var data = (MyType)bf.Deserialize(fs);
...
}
Because of the way the Binary Serializer works, it is painfully slow. It injects a lot of reflection-based metadata into the binary file. I ran some tests against some rather large structures a few years back and found that the XMLSerializer is both smaller and faster than the binary serializer. Go figure!
In either case, the serialization is done via reflection, which is slow. You might consider your own serialization mechanism.
I once created my own binary serialization mechanism (using file write/read), and it performed 20 times faster than the XML serializer, which performed faster than the binary serializer. It was also significantly smaller.
You might want to consider doing something like that.