I realize that a SO user has formerly asked this question but it was

Question

0

Asked: June 6, 20262026-06-06T12:17:19+00:00 2026-06-06T12:17:19+00:00

I realize that a SO user has formerly asked this question but it was

0

I realize that a SO user has formerly asked this question but it was asked in 2009 and I was hoping that more knowledge of HDF5 was available or newer versions had fixed this particular issue. To restate the question here concerning my own problem;

I have a gigantic file of nodes and elements from a large geometry and have already retrieved all the useful information I need from it. Therefore, in Python, I am trying to keep the original file, but delete the information I do not need and fill in more information for other sources. For example, I have a dataset of nodes that I don’t need. However, I need to keep the neighboring dataset and include information about their indices from an outside file. Is there any way to delete these specific datasets?

Or is the old idea of having “placekeepers” in the HDF5 file still holding true, such that no one knows how/bothers with removing info? I’m not too worried about the empty space, as long as it is faster to simply remove and add on information then to create an entirely new file.

Note: I’m using H5py’s ‘r+’ to read and write.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T12:17:21+00:00

Removing entire nodes (groups or datasets) from a hdf5 file should be no problem.
However if you want to reclaim the space you have to run the h5repack tool.

From the hdf5 docs:

5.5.2. Deleting a Dataset from a File and Reclaiming Space

HDF5 does not at this time provide an easy mechanism to remove a
dataset from a file or to reclaim the storage space occupied by a
deleted object.

Removing a dataset and reclaiming the space it used can be done with
the H5Ldelete function and the h5repack utility program. With the
H5Ldelete function, links to a dataset can be removed from the file
structure. After all the links have been removed, the dataset becomes
inaccessible to any application and is effectively removed from the
file. The way to recover the space occupied by an unlinked dataset is
to write all of the objects of the file into a new file. Any unlinked
object is inaccessible to the application and will not be included in
the new file. Writing objects to a new file can be done with a custom
program or with the h5repack utility program.

Alternatively you can also have a look into PyTables`s ptrepack tool.
PyTables should be able to read h5py hdf5 files and the ptrepack tool is similar to the h5repack.

If you want to remove records from a datasets, then you probably have to retrieve the records you want to keep and create a new dataset and remove the old one.
PyTables supports removing rows, however it’s not recommended.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I realize that a SO user has formerly asked this question but it was

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply