I’m developing a proof-of-concept object (de-)serialization framework, ideally able to serialize any Object and gather information about the class itself. I started implementing it using Reflection, to:
- Access the type hierarchy (Superclasses, Interfaces, and so on)
- Find all fields on that object, and get all values in that fields
Serializing is the ‘easy’ part, and can be achieved applying this rule recursively to an object until I found null or primitive types. Now that’s where I’m stuck: the deserialization.
Starting with a simple object, the “Hello World” String, I have this serialization:
<object type="java.lang.String">
<primitive name="count" type="int" value="11 />
<primitive name="hash" type="int" value="0" />
<primitive name="offset" type="int" value="0" />
<array name="value" basetype="char">
<value>H</value>
<value>e</value>
<value>l</value>
...
<value>r</value>
<value>l</value>
<value>d</value>
</array>
</object>
It’s ok to deserialize, because String class has a default constructor and I can call it via Reflection, and I can set all fields. Now, let’s suppose I have the following serialization for a object:
<object class="some-class-with-no-default-constructor">
<object name="some-attrib-name" class="attrib-1-class">
<primitive name="size" type="int" value="5" />
...
</object>
What happens if I don’t have a default constructor AND all other constructors that receive arguments can’t accept ‘null’ values as input, raising some kind of exception and thus I’m not able to instantiate the class via reflection?
The question is: “Is there a way to instantiate an ’empty object’ of some class to set their fields manually after the instantiation without calling its constructors?”. I’m open to discuss other strategies too, of course.
Thank you.
EDIT
Once that it’s a proof-of-concept environment, and thus I’m not considering Security restrictions, I’ve found a way to instantiate any object without calling its constructor, through the Unsafe class.
public final class A {
private final Object o;
private A(final Object o) { if (o == null) throw new Error(); this.o = o; }
public static A a() { return new A(new Object()); }
public Object getO() { return o; }
}
This class shown above was proposed in one of the answers below, and it can be instantiated and it’s final value set correctly (provided, of course, the security restrictions don’t apply), using the following code:
private static Unsafe getUnsafe() throws Exception {
Field vDeclaredField = Unsafe.class.getDeclaredFields()[0];
vDeclaredField.setAccessible(true);
Unsafe vUnsafe = (Unsafe) vDeclaredField.get(null);
vDeclaredField.setAccessible(false);
return vUnsafe;
}
public static void main(String[] args) throws Exception {
A objectA = (A) getUnsafe().allocateInstance(A.class);
Field fieldO = A.class.getDeclaredField("o");
boolean oldAccessibilityValue = fieldO.isAccessible();
fieldO.setAccessible(true);
Object objectOParameter = Arrays.asList(1,2,3,4); //could be any object
fieldO.set(objectA, objectOParameter);
fieldO.setAccessible(oldAccessibilityValue); //I personally prefer setting it to old value
assert(objectOParameter.equals(objectA.getO()));
}
So? Can you guys see any other problems that are not related to SecurityManager itself?
It can’t be done reliably.
Suppose you have the following class:
First of all, you have the problem you mention about a non-default constructor, which takes an argument, and will throw an exception when given
null.Second, the argument to the constructor can (in this case will) define the value of a final instance field, which you cannot reliably control after the object has been built (either because of the memory model semantics of final fields that may cause visibility problems had the object already been published to other threads, or because of a
SecurityManagerthat won’t allow you to modify the final field).Finally, the constructor is private (or protected, or package-protected, whatever). If there’s a security manager installed, it might completely block your attempts to
setAccessible(true)on the constructor, so that you could force it to be called.So, I’d either simply drop the project as you propose, or make some restrictions on the characteristics of the objects that your framework can (de)serialize.
As a final consideration, serialization is not simply the process of saving and restoring fields. It’s something that must be carefully planned and implemented during the design of a class. A class must be designed to be serializable.
REPLY TO EDIT
It would not be fair to call the code you provide "pure Java", since it uses the non-standard
API "sun.misc.Unsafe", that is present in Sun’s implementation but not guaranteed to be present in all implementations. The code is, therefore, dependent on the implementation.
In the test code you wrote, it is assumed knowledge about the class, ie, you use getDeclaredField("o"). Anyway, I think this could easily be fixed.
However, I see two problems.
Thou shall not serialize system resources
First of all, suppose I have a class like this:
How can you possibly serialize a
Thread? What would be the semantics of the serialized object? What if the thread is in the middle of an IO operation, like reading from the socket? How would you serialize the socket connection?! It makes no sense. And this class is quite a normal one.Thou shall not share deserialized instances without synchronization, even if the class is perfectly thread-safe
Let’s forget about semantics, and get back to the language specification, and find another problem with your approach. (edit: changed the class to make the point even stronger). Consider the following class, which represents a mutable range of integers:
A very simple, innocent-looking class, right? Note that the array int[] is used as the return type of the getter since if we use a couple of getters, the values of
aandbmight change between the two calls to the getter.So, this class is perfectly thread-safe. Under "normal" circumstances, it cannot possibly be in a state such that "a >= b".
By using the deserialization technique proposed by the OP, this guarantee disappears. Suppose the OP gave me 2 methods, a "Object serialize(Object o)", and a "Object deserialize(Object o)", which uses the algorithm proposed. The following pseudo-code will prove it does not work:
What will it print? First of all, it can print null, if the write to the field
ris not seen by T2. To make things more interesting (and see how subtle this can get), let’s suppose T2 actually sees this write to the fieldr. Since the deserialization process provide no synchronization, the JVM is free to reorder the writes to the fields inside the newly deserializedRangeinstance at its will. So, it can print "0 < 0" if T2 sees none of the writes toaandb, or "1 < 0" (if it sees only the write to a), or "0 < 3" or "1 < 3". According to the Java Language Specification, you cannot possibly predict the outcome (your only guarantee is that the outcome must be one of this 5 possibilities).So, the point is: you cannot possibly make this work reliably for every single class. I can always hide a lock acquisition, and you wouldn’t be able to trace it (without some serious, hardcore, (impossible?) bytecode analysis) so the deserialized version of the class would not be seen equally by every thread… can you see the huge problems that can appear?
To sum up…
Such a framework cannot exist. You will have problems with security managers (the use of
setAccessible(true)), with portability of your code (the use ofsun.misc.Unsafe), with multi-threading (class Range), with no-sense, unusable deserialized instances (class StockQuoteProvider). These are only the first 4 problems I could come up with, and that cannot be solved with a pure Java code and without absolutely no assumptions on the objects being serialized.So, the conclusion is that you must restrict the objects that your framework will be able to serialize. In other words, the objects must be designed to be serializable objects.
Good luck.