I’m looking at using AVRO on hadoop. But I am concerned with serialization of large data-structures and how to add methods to the (data-) classes.
The example (taken from http://blog.voidsearch.com/bigdata/apache-avro-in-practice/) shows a model of facebook users.
{
"namespace": "test.avro",
"name": "FacebookUser",
"type": "record",
"fields": [
{"name": "name", "type": "string"},
...,
{"name": "friends", "type": "array", "items": "FacebookUser"}
]
}
Does avro serialize the complete social graph of a facebookuser in this model?
[That is, if I want to serialize one user, does the serialization include all it’s friends and their friends and so on?]
If the answer is yes, I’d rather store ID’s of friends instead of references, to look up in my application whenever needed. In that case I would like to be able to add a method that returns the actual friends instead of ID’s.
How can I wrap/extend generated AVRO java classes to add methods?
(also to add methods that return for example friend-count)
Regarding the second question: How can I wrap/extend generated AVRO java classes to add methods?
You can use the AspectJ to inject new methods into an existing/generated class. AspectJ is required only at compile-time. Approach is illustrated below.
Define a Person record as Avro IDL (person.avdl):
use maven and the avro-maven-plugin to generate java sources from the AVDL:
Above configuration presumes that the person.avid file is in src/main/resources/avro. Sources are generated in target/generated-sources/java.
Generated Person.java has two methods: getFirstName() and getLastName(). If you want to extend it with another method: getCompleteName() = firstName + lastName then you can inject this method with the following aspect:
Use the aspectj-maven-plugin maven plugin to weave this aspect with the generated code
and the result: