any advice would be appreciated, I’m still learning in c# so I apologize if I miss something obvious. I’m using VS2010 and the application uses net 2.0
I’m looking to speed up these two processes as much as possible. The first process is reading in data tables from a server, then storing them as cache files. Each cache file has multiple data tables. The second part is retrieving these data-tables from a cache file and storing them in a dataset.
Originally the process stored the data tables as XML files, and this took forever both with creating the cache files and retrieving them. when running the application. These tables can range anywhere from 10MB to 400MB in size.
I set it up so it built and read the cache files to and from my local machine.
I tried using binary serialization, which helped a good amount. It took the tables down to about 1/6 the XML file size and also sped it up, but I’m looking to see if there is something faster. I have been looking for awhile now and I cannot find anything else. I checked out protobuf-net, which looks like a fantastic way to speed up serializing, but from what I found data tables do not seem work well with it.
Here are some numbers..
Time to build Cache files:
XML-about 2 hours,
Binary - about 1 hour
Test Case for reading from Cache file:
XML - 3m 40s,
Binary - 2m 20s
I know this is a lot of data and can’t expect a whole lot, but is there another way?
The first rule of optimization is to measure where time is being spent. It may be a good guess that the time is in the serialization code, but there’s nothing like a good profiler session to be sure…
Having said that, the performance gains you see when changing the serialization mechanism do indicate that at least a chunk of time is spent on serialization itself.
XML Serializer is horribly slow for large files. BinaryFormatter is better, but still not exactly a speed demon.
Protocol Buffers are around 6x faster and store data much more compact than BinaryFormatter.
http://theburningmonk.com/2011/08/performance-test-binaryformatter-vs-protobuf-net/
Marc Gravell (of Stack Overflow) wrote an implementation of protocol buffers in .NET
http://code.google.com/p/protobuf-net/
You can get this using NuGet.
(Jon Skeet did as well, but I prefer Marc’s implementation).
There is also
https://nuget.org/packages/protobuf-net-data/2.0.5.480
(Also available through NuGet)