Summary:
I have a struct that is read/written to file.
This struct changes frequently, and this causes my read() function to get complex.
I need to find a good way to handle change while keeping the bug count low.
Optimally, code should be make it easy for one to spot the changes between versions.
I have thought through a couple of patterns but I don’t know if I have gone through all possible options.
As you will see, the code was mostly in C-like, but I am in the process of turning it into C++.
Details
As I said, my struct changes frequently (almost in every version of the program).
- Some members are deleted, some members are added, some are made more complex. It is not a simple case where a new member appears the structure.
So far, changes to the struct have been handled like:
- in version_1, I used a color map table:
struct Obj {
int color_index;
};
void Read_Obj( File *f, Obj *o ) {
f->read( f, &o->color_index );
}
void Write_Obj( File *f, Obj *o ) {
f->write( f, o->color_index );
}
- in the next version, I changed it into [r,g,b] form
struct Obj {
int color_r;
int color_g;
int color_b;
};
void Read_Obj( File *f, Obj *o ) {
if( f->version() == File::Version1 ) {
int color_index;
f->read( f, &color_index );
ColorIndex_to_RGB( o, color_index ); // we used color maps back then
}
else {
f->read( f, &o->color_r );
f->read( f, &o->color_g );
f->read( f, &o->color_b );
}
}
void Write_Obj( File *f, Obj *o ) {
f->write( f, o->color_r );
f->write( f, o->color_g );
f->write( f, o->color_b );
}
[brief note]
Note here that I know could have used
void Read_Obj( File *f, Obj *o ) {
if( f->version() == File::Version1 ) {
Read_Obj_V1( f, o );
}
else {
Read_Obj_V2( f, o );
}
}
but that tends to code duplication between each of the sub-functions, since, in real life, only 1-2 out of ~20 members of the struct changes on each version. So, the other 18 lines remain the same.
Of course, I could change to this policy if for a good reason
[end of brief note]
Now these structs have become complicated and I need to convert them to a class, and work in a more object-oriented fashion.
I have seen a pattern where you use one class to read for each old version, and then convert the data to a newer class.
class Obj_v1 {
int m_color_index;
read( File *f ) {
f->read( f, &m_color_index );
}
void convert_to( Obj * ) { /* code to convert the older object */ }
};
class Obj {
int m_r;
int m_g;
int m_b;
read( File *f ) {
f->read( f, &m_r );
f->read( f, &m_g );
f->read( f, &m_b );
}
};
void Read_Obj( File *f, Obj *o ) {
if( f.version() == File::Version1 ) {
Obj_v1 old();
old.read( f );
old.convert_to( o );
}
else {
o.read( f );
}
}
void Write_Obj( File *f, Obj *o ) {
o->write( f );
}
However, there are two strategies for dealing with change:
Strategy 1 : direct conversions
void Read_Obj( File *f, Obj *o ) {
if( f->version() == File::Version1 ) {
Obj_v1 old();
old.read( f );
old.convert_to( o );
}
else if( f->version() == File::Version2 ) {
Obj_v2 old();
old.read( f );
old.convert_to( o );
}
else {
o.read( f );
}
}
Disadvantage:
- This implies that you have to update the
convert_to()of allObj_vXclasses each time you change theObjclass. Too many possibilities for bugs thrown in each time.
Benefit:
- You are always able to fit an old concept (struct) to the new – compare with a cascaded strategy (next), where some information may be lost along the way, so it cannot be used.
Strategy 2 : cascaded conversions
void Read_Obj( File *f, Obj *o ) {
Obj_v1 o1();
Obj_v2 o2();
if( f->version() == File::Version1 ) {
o1.read( f );
o1.convert_to( o2 );
o2.convert_to( o );
}
else if( f->version() == File::Version2 ) {
o2.read( f );
o2.convert_to( o );
}
else {
o.read( f );
}
}
Disadvantages:
-
Some information may exist in v1, which was useless in v3, but v5 could make use of it; however, cascaded conversions have wiped out this data.
-
Older versions will tend to take longer to create objects.
Benefit:
- You only have to write one
convert_to()each time you change theObjclass. However, one bug in one of the converters in the line, could have more severe effects, and could wreck the consistency of the database. You have increased chances of finding such a bug, though.
Worries:
- Could it be that conversion-after-conversion you get too much noise in objects of older versions, that they are wrong?
Question:
-
Are there any other patterns that do a better job at this ?
-
The ones of you that had some experience with my proposals, what do you think of my worries on the above implementations ?
-
Which are preferable solutions?
thank you so much
The
ifis so to say a hidden switch/case. And switch/case in C++ is generally interchangeable with polymorphism. Example:And then instantiate the appropriate Reader descendant after opening the file and detecting the version number. That way you would have only one file version check in the top level code, instead of polluting all of the low-level code with the checks.
If code is common between the file version, for convenience you can also put it into the base reader class.
I would strongly advise against the variant with
class Obj_v1andclass Objwhere theread()method belongs to theObjitself. This way one easily end-up with circular dependencies and also it is a bad idea to make an object aware of its persistent presentation. IME (in my experience) it is better to have the 3rd party reader class hierarchy responsible for that. (As in thestd::iostreamvs.std::stringvs.operator <<: stream doesn’t know string, string doesn’t know stream, only the opeartor<<knows both.)Otherwise, I personally do not see any big difference between your “Strategy 1” and “Strategy 2”. They both use the
convert_to()what I personally think is superficial. IME solution with the polymorphism should be used instead – automatically converting everything to the up-to-date version of the objectclass Obj, without the intermediateclass Obj_v1andclass Obj_v2. Since with polymorphism you would have a dedicated read function for every version, ensuring proper object recreation from the read information is easy.This is precisely what polymorphism was intended to address and how I generally do such tasks myself.
This is related to object serialization, but I have not seen a single serialization framework (my info is likely outdated) which was capable of supporting several version of the same class.
I personally did end up several times with the following serialization/deserialization class hierarchy:
Hope that helps.