I realize that a SO user has formerly asked this question but it was asked in 2009 and I was hoping that more knowledge of HDF5 was available or newer versions had fixed this particular issue. To restate the question here concerning my own problem;
I have a gigantic file of nodes and elements from a large geometry and have already retrieved all the useful information I need from it. Therefore, in Python, I am trying to keep the original file, but delete the information I do not need and fill in more information for other sources. For example, I have a dataset of nodes that I don’t need. However, I need to keep the neighboring dataset and include information about their indices from an outside file. Is there any way to delete these specific datasets?
Or is the old idea of having “placekeepers” in the HDF5 file still holding true, such that no one knows how/bothers with removing info? I’m not too worried about the empty space, as long as it is faster to simply remove and add on information then to create an entirely new file.
Note: I’m using H5py’s ‘r+’ to read and write.
Removing entire nodes (groups or datasets) from a hdf5 file should be no problem.
However if you want to reclaim the space you have to run the
h5repacktool.From the hdf5 docs:
Alternatively you can also have a look into PyTables`s ptrepack tool.
PyTables should be able to read h5py hdf5 files and the
ptrepacktool is similar to theh5repack.If you want to remove records from a datasets, then you probably have to retrieve the records you want to keep and create a new dataset and remove the old one.
PyTables supports removing rows, however it’s not recommended.