i store (non equidistant) time series as tables in hdf5 files using the H5TB API. The format is like this:
time channel1 channel2
0.0 x x
1.0 x x
2.0 x x
There are also insertions of “detail data” like this:
time channel1 channel2
0.0 x x
1.0 x x
1.2 x x
1.4 x x
1.6 x x
1.8 x x
2.0 x x
Now I want to store the data in another data format and therefore I like to “query” the hdf5 file like this:
select ch1 where time > 1.6 && time < 3.0
I thought of several ways to do this query:
- There is a built in feature called B-Tree Index. Is it possible to use this for indexing the data?
- I need to do a binary search on the time channel and then read the channel values
- I create an index myself (and update it whenever there is a detail insertion). What would be the best algorithm to use here?
The main motivation for an index would be to have fast query responses.
What would you suggest here?
I found another (obvious) solution finally by myself. The easiest way is to open the hdf5 file only read the time channel and create an in memory map before reading the data channels. This process could even be optimized by reading the time channel with a sparse hyperslab.
When the indexes at a particular time are known then the data could be read.