I am currently starting a project titled “Cloud computing for time series mining algorithms using Hadoop”.
The data which I have is hdf files of size over a terabyte.In hadoop as I know that we should have text files as input for further processing (map-reduce task). So I have one option that I convert all my .hdf files to text files which is going to take a lot of time.
Or I find a way of how to use raw hdf files in map reduce programmes.
So far I have not been successful in finding any java code which reads hdf files and extract data from them.
If somebody has a better idea of how to work with hdf files I will really appreciate such help.
Thanks
Ayush
For your first option, you could use a conversion tool like HDF dump to dump HDF file to text format. Otherwise, you can write a program using Java library for reading HDF file and write it to text file.
For your second option, SciHadoop is a good example of how to read Scientific datasets from Hadoop. It uses NetCDF-Java library to read NetCDF file. Hadoop does not support POSIX API for file IO. So, it uses an extra software layer to translate POSIX call of NetCDF-java library to HDFS(Hadoop) API calls. If SciHadoop does not already support HDF files, you might go along a little harder path and develop a similar solution yourself.