I have some model objects that I save in a DB serialized with protobuf. I want to compare the version I will save to the existing one, to avoid to add two times the same version.
Ideally I should
byte[] existingBlob = GetFromDBExistingModelObject();
ModelType existingModel = existingBlob.Deserialize();
if (!model.Equals(existingModel))
{
byte[] serializedModel = model.Serialize();
Save(serializedModel); //Save in DB the new blob
}
However I will have to implement .Equals on every model object and this be quite painful. I would like to do
byte[] existingBlob = GetFromDBExistingModelObject();
byte[] serializedModel = model.Serialize();
if (!compareBlob(existingBlob, serializedModel)
{
Save(serializedModel);
}
private bool compareBlob(byte[] existingBlob, byte[] serializedModel)
{
if (serializedModel.Length != existingBlob.Length)
{
return false;
}
return !serializedModel.Where((t, i) => t != existingBlob[i]).Any();
}
I also do that for performance, because I don’t deserialize the existingBlob
What do you think of this implementation ? Do you think I can rely on this comparison ? I use protobuf for serialization.
Thanks for your comment.
protobuf-net will produce a predictable output, but strictly speaking that is not guaranteed by the spec; – there are 2 edge cases (field-order, and sub-normal forms† for varint encoding) that technically could produce different output with the same meaning, but protobuf-net will always produce the same output currently.
I am toying with adding an option to deliberately use sub-normal varint forms to avoid some memory shuffling, but that would be opt-in only.
So; as long as you aren’t building your binary files by appending (protobuf is an appendable format, but obviously all bets are off if you are appending in arbitrary orders), then yes: the data on the wire should be predictable, and you can compare the byte sequence to test for equality.
As a minor note, I would recommend a regular
forloop here, for efficiency:(if you are particularly speed-crazy, you could even use
unsafecode and compare it as aint*orlong*instead (taking 1/4 or 1/8 of the tests), and just check the last few bytes manually)You might also consider comparing a hash (sha1 etc) instead of byte-by-byte; this would be especially useful for large models, especially if you can store the hashed value along with the original (so you never have to fetch the original existing BLOB – just the existing hash).
† : specifically, the bit-sequence
10000000or00000000at the “big end” of a varint just means “and more zeros at the big end” (with or without more data to follow), so has no impact on the number; hence any (reasonable) number of0x80 0x80 0x00on the end of a varint does not change the result; there is a use-case where-by this could be used to avoid having to move data around, by deliberately using an oversized varint as a length-prefix.