I’ve just discovered the HDF5 format and I’m considering using it to store 3D data spread over a cluster of Java application servers. I have found out that there are several implementations available for Java, and would like to know the differences between them:
-
Java HD5 Interface (JHI5) The Java wrapper from the HDF group itself.
-
Nujan: Pure Java NetCDF4 and HDF5 writer (cannot read HDF5)
Most importantly, I would like to know:
-
How much of the native API is covered, any limitations that do not
exist in the native API? -
If there is support for “Parallel HDF5”?
-
Once my 3D data is loaded, do I get a “native call overhead”
each time I access one element in a 3D array? That is, do the data
actually gets turned into Java objects, or stay in “native/JNI
memory”? -
Is there any know stability problems with a particular
implementation, since a crash in native code normally takes the whole
JVM down?
HDF Java follows a layered approach:
JHI5 – the low level JNI wrappers: very flexible, but also quite tedious to use.
Java HDF object package – a high-level interface based on JHI5.
HDFView – a Java-based viewer application based on the Java HDF object package.
JHDF5 provides a high-level interface building on the JHI5 layer which provides most of the functionality of HDF5 to Java. The API has a shallow learning curve and hides most of the house-keeping work from the developer. You can run the Java HDF object package (and HDFView) on the JHI5 interface that is part of JHDF5, so the two APIs can co-exist within one Java program.
Permafrost and Nujan seem far from being complete at this point and Permafrost hasn’t seen a lot of activity recently, so they appear to be not the first choice at this point in time.
I think a good path for you is to have a look at both the Java HDF5 object package and JHDF5, decide which one of the two APIs fit your needs better and go with that one.
Disclaimer: I have worked on the JHDF5 interface, so I may be biased.