I’m writing a particular serialisation system. The first version works well. It’s a hierarchial string-key, data-value system. So to get a particular value, you navigate to a particular node and say getInt(“some key”) etc. etc.
My issue with the current system is that the file size gets quite large very quickly.
I’m going to combat this by adding a string table. The issue with this is that I can’t think of a way to support the old system. All I have is a file identifier which is 32 bits long.
I can change the file identifier, but everytime I make another change to the format, I’ll need to change the identifier again.
What’s an elegant way to implement new features while still supporting the old features?
I’ve studied the PNG format and creating chunks seems like a good way to go.
Is there any other advice you can give me on chunk dependencies and so forth?
If you need a binary format, look at Protocol Buffers, which Google uses internally for RPCs as well as long-term serialization of records. Each field of a protocol buffer is identified by an integer ID. Old applications ignore (and pass through) the fields that they don’t understand, so you can safely add new fields. You never reuse deprecated field IDs or change the type of a field.
Protocol buffers support primitive types (bool, int32, int64, string, byte arrays) as well as repeated and even recursively nested messages. Unfortunately they don’t support maps, so you have to turn a map into a list of (key, value).
Don’t spend all your time fretting about serialization and deserialization. It’s not as fun as designing protobufs.