Say I have an object called data which contains a variety of information. Let’s say for argument that there is actually quite a lot of stuff within the data graph.
If I serialise it using BinaryFormatter then I get a file which is, say, 5Mb.
If I encapsulate the serialisation stream in a GZipStream then I get a much smaller file, say, 1Mb.
I can, if I want, encrypt the stream while compressing it, or encrypt the stream without compressing it.
The issue is: I need to know what was done during serialisation so that I know what to do when I deserialise it.
One technique would be to use a different file extension. For example, an uncompressed, unencrypted file might have a .dat extension, .zdat for compressed, .cdat for encrypted, and .czdat for compressed and encrypted.
This would work, but it introduces a potential problem: What if the user changes the extension, etc. It also means that if I want to associate the files in Windows, there are 4 extensions instead of 1 which need to be associated – quadrupling the risk of collisions with existing the associations.
If I wrap my data object in a simple class:
[Serializable]
public class SerialisationContainer
{
public string SerialisedData { get; private set; }
public bool Compressed { get; private set; }
public bool Encrypted { get; private set; }
public SerialisationContainer()
{
// etc...
}
public object GetObject()
{
// etc...
}
}
then I’m basically serialising an object which has a serialised stream in it which may be compressed and/or encrypted, but we don’t know or care at this point because the meta-information is stored in the SerialisationContainer.
What do you think? I’m basically just curious what you think of this method, and what you do in similar situations. I think the above method is a very wasteful way of doing what I want. I would basically need to serialise my data graph to a memory stream, convert it to a string, place the string inside my container, and then serialise it again.
Another issue is the length of the string SerialisedData. In the example I gave we only have about 5Gb of BinaryData, but what about when it starts getting larger? I know an upper-bound for a string on a 64-bit OS is around 2GB and significantly less for a 32-bit OS. Do streams have such a limitation? Since streams are written in blocks of bytes, it makes sense that they wouldn’t.
First of all, the lazy solution: you don’t have to serialize directly to a file. You can serialize to memory, and then write a file that has 1 byte for format followed by serialization data.
Second, you can get a little smarter: Open a file; write one byte to it (the format); serialize into the same string. To deserialize, read one byte to figure out the format, and then pass the stream to the deserializer; it will only read data after that one byte.
If you have the methods
your code can look like this: