I am definitely missing something very obvious but can anyone explain why there is a lot better compression rate in second case?!
Case 1: very low compression and sometimes even growth in size.
using (var memoryStream = new System.IO.MemoryStream())
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
{
new BinaryFormatter().Serialize(gZipStream, obj);
gZipStream.Close();
return memoryStream.ToArray();
}
Case 2: a lot better compression and I did not get a size growth.
using (MemoryStream msCompressed = new MemoryStream())
using (GZipStream gZipStream = new GZipStream(msCompressed, CompressionMode.Compress))
using (MemoryStream msDecompressed = new MemoryStream())
{
new BinaryFormatter().Serialize(msDecompressed, obj);
byte[] byteArray = msDecompressed.ToArray();
gZipStream.Write(byteArray, 0, byteArray.Length);
gZipStream.Close();
return msCompressed.ToArray();
}
I have done mirrored decompression and in both cases I can deserialize it into source object without any issues.
Here are some stats:
UncSize: 58062085B, Comp1: 46828139B, 0.81%
UncSize: 58062085B, Comp2: 31326029B, 0.54%
UncSize: 7624735B, Comp1: 7743947B, 1.02%
UncSize: 7624735B, Comp2: 5337522B, 0.70%
UncSize: 1237628B, Comp1: 1265406B, 1.02%
UncSize: 1237628B, Comp2: 921695B, 0.74%
You don’t say which version of .NET you’re using. In versions prior to 4.0,
GZipStreamcompresses data on a per-write basis. That is, it compresses the buffer you send to it. In your first example, theSerializemethod is likely writing very small buffers to the stream (one field at a time). In your second example,Serializeserializes the entire object to the memory stream, and then the memory stream’s buffer is written to theGZipStreamin one big chunk.GZipStreamdoes much better when it has a larger buffer (64K is close to optimum) to work with.This may still be the case in .NET 4.0. I don’t remember if I tested it.
The way I’ve handled this in the past is with a BufferedStream:
That way, the compressor gets a 64K buffer to work with.
Prior to .NET 4.0, there was no benefit to providing a buffer larger than 64K for
GZipStream. I’ve seen some information indicating that the compressor in .NET 4.0 can do a better job of compression with a larger buffer. However, I’ve not tested that myself.