I must serialize a huge tree of objects (7,000) into disk. Originally we kept this tree in a database with Kodo, but it would make thousands upon thousands of Queries to load this tree into memory, and it would take a good part of the local universe available time.
I tried serialization for this and indeed I get a performance improvement. However, I get the feeling that I could improve this by writing my own, custom serialization code. I need to make loading this serialized object as fast as possible.
In my machine, serializing / deserializing these objects takes about 15 seconds. When loading them from the database, it takes around 40 seconds.
Any tips on what could I do to improve this performance, taking into consideration that because objects are in a tree, they reference each other?
One optimization is customizing the class descriptors, so that you store the class descriptors in a different database and in the object stream you only refer to them by ID. This reduces the space needed by the serialized data. See for example how in one project the classes SerialUtil and ClassesTable do it.
Making classes Externalizable instead of Serializable can give some performance benefits. The downside is that it requires lots of manual work.
Then there are other serialization libraries, for example jserial, which can give better performance than Java’s default serialization. Also, if the object graph does not include cycles, then it can be serialized a little bit faster, because the serializer does not need to keep track of objects it has seen (see ‘How does it work?’ in jserial’s FAQ).