I have to perform scientific experiments using time series.
I intend to use MySQL as the data storage platform.
I’m thinking of using the following set of tables to store the data:
Table1 –> ts_id (store the time series index, I will have to deal with several time series)
Table2 –> ts_id, obs_date, value (should be indexed by {ts_idx,obs_date})
Because there will be many time series (hundreds) each with possibly millions of observations, table 2 may grow very large.
The problem is that I have to replicate this experiment several times, so I’m not sure what would be the best approach:
- add an
experiment_idto the tables and allow them to grow even more. - create a separate data base for each experiment.
if option 2 is better (I personally think so), what would be the best logical way to do this? I have many different experiments to perform, each needing replication. If I create a different data base for every replication, I’d get hundreds of data bases pretty soon. Is there a way to logically organize them, such as each replication as a “sub-database” of its experiment master database?
You might want to start out by considering how you will need to analyze your data.
Assumably your analysis will need to know about experiment name, experiment replica number, internal replicates (e.g. at each timepoint there 3 “identical” subjects measured for each treatment). So your db schema might be something like this:
If you have internal replicates you’ll need another table to hold the internal replicate/subject relationship.
Don’t worry about your millions of rows. As long as you index sensibly, there won’t likely be any problems. But if worse comes to worst you can always partition your observation table (likely to be the largest) by
rep_id.