Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8698929
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T01:49:58+00:00 2026-06-13T01:49:58+00:00

If I serialize an object using a schema version 1, and later update the

  • 0

If I serialize an object using a schema version 1, and later update the schema to version 2 (say by adding a field) – am I required to use schema version 1 when later deserializing the object? Ideally I would like to just use schema version 2 and have the deserialized object have the default value for the field that was added to the schema after the object was originally serialized.

Maybe some code will explain better…

schema1:

{"type": "record",
 "name": "User",
 "fields": [
  {"name": "firstName", "type": "string"}
 ]}

schema2:

{"type": "record",
 "name": "User",
 "fields": [
  {"name": "firstName", "type": "string"},
  {"name": "lastName", "type": "string", "default": ""}
 ]}

using the generic non-code-generation approach:

// serialize
ByteArrayOutputStream out = new ByteArrayOutputStream();
Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
GenericDatumWriter writer = new GenericDatumWriter(schema1);
GenericRecord datum = new GenericData.Record(schema1);
datum.put("firstName", "Jack");
writer.write(datum, encoder);
encoder.flush();
out.close();
byte[] bytes = out.toByteArray();

// deserialize
// I would like to not have any reference to schema1 below here
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema2);
Decoder decoder = DecoderFactory.get().binaryDecoder(bytes, null);
GenericRecord result = reader.read(null, decoder);

results in an EOFException. Using the jsonEncoder results in an AvroTypeException.

I know it will work if I pass both schema1 and schema2 to the GenericDatumReader constructor, but I’d like to not have to keep a repository of all previous schemas and also somehow keep track of which schema was used to serialize each particular object.

I also tried the code-gen approach, first serializing to a file using the User class generated from schema1:

User user = new User();
user.setFirstName("Jack");
DatumWriter<User> writer = new SpecificDatumWriter<User>(User.class);
FileOutputStream out = new FileOutputStream("user.avro");
Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
writer.write(user, encoder);
encoder.flush();
out.close();

Then updating the schema to version 2, regenerating the User class, and attempting to read the file:

DatumReader<User> reader = new SpecificDatumReader<User>(User.class);
FileInputStream in = new FileInputStream("user.avro");
Decoder decoder = DecoderFactory.get().binaryDecoder(in, null);
User user = reader.read(null, decoder);

but it also results in an EOFException.

Just for comparison’s sake, what I’m trying to do seems to work with protobufs…

format:

option java_outer_classname = "UserProto";
message User {
    optional string first_name = 1;
}

serialize:

UserProto.User.Builder user = UserProto.User.newBuilder();
user.setFirstName("Jack");
FileOutputStream out = new FileOutputStream("user.data");
user.build().writeTo(out);

add optional last_name to format, regen UserProto, and deserialize:

FileInputStream in = new FileInputStream("user.data");
UserProto.User user = UserProto.User.parseFrom(in);

as expected, user.getLastName() is the empty string.

Can something like this be done with Avro?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T01:49:59+00:00Added an answer on June 13, 2026 at 1:49 am

    Avro and Protocol Buffers have different approaches to handling versioning, and which approach is better depends on your use case.

    In Protocol Buffers you have to explicitly tag every field with a number, and those numbers are stored along with the fields’ values in the binary representation. Thus, as long as you never change the meaning of a number in a subsequent schema version, you can still decode a record encoded in a different schema version. If the decoder sees a tag number that it doesn’t recognise, it can simply skip it.

    Avro takes a different approach: there are no tag numbers, instead the binary layout is completely determined by the program doing the encoding — this is the writer’s schema. (A record’s fields are simply stored one after another in the binary encoding, without any tagging or separator, and the order is determined by the writer’s schema.) This makes the encoding more compact, and saves you from having to manually maintain tags in the schema. But it does mean that for reading, you have to know the exact schema with which the data was written, or you won’t be able to make sense of it.

    If knowing the writer’s schema is essential for decoding Avro, the reader’s schema is a layer of niceness on top of it. If you’re doing code generation in a program that needs to read Avro data, you can do the codegen off the reader’s schema, which saves you from having to regenerate it every time the writer’s schema changes (assuming it changes in a way that can be resolved). But it doesn’t save you from having to know the writer’s schema.

    Pros & Cons

    Avro’s approach is good in an environment where you have lots of records that are known to have the exact same schema version, because you can just include the schema in the metadata at the beginning of the file, and know that the next million records can all be decoded using that schema. This happens a lot in a MapReduce context, which explains why Avro came out of the Hadoop project.

    Protocol Buffers’ approach is probably better for RPC, where individual objects are being sent over the network (as request parameters or return value). If you use Avro here, you may have different clients and different servers all with different schema versions, so you’d have to tag every binary-encoded blob with the Avro schema version it’s using, and maintain a registry of schemas. At that point you might as well have used Protocol Buffers’ built-in tagging.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

In C# / .NET 2.0, when I serialize an object using XmlSerializer , what's
I am using Json.NET to serialize an object graph. For each object that is
I'm using XmlSerializer to serialize my object model to XML. At the moment just
I am using the following code to serialize an object to XML, StringBuilder sb
While using this code to serialize an object: public object Clone() { var serializer
I am using the following function to attempt to serialize an object to XML..
Normally, when using the XMLSerializer to automagically serialize an ISerializable object, a .dll file
So I've been using this code to automatically serialize my User business object which
I am using JAXB2 to serialize object to xml. Is there any way how
I'm trying to serialize a bool object using in the element text and I'm

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.