I’m planning a new side-project that will ultimately involve analyzing data I collect. It’s mostly time series data with varying numbers of components (think relational database columns). The time series data will vary in the time periods covered and frequencies measured (so there isn’t too much standardized info to combine into fewer tables). None of the time series sets will have too much data, maybe a maximum of around 100,000 measurements per series with an average of around 5000 measurements (think rows). I expect there to be at least 10,000 different sets of time series data (think tables).
I don’t anticipate having to do many complex queries (and even if I did, there is nothing time-sensitive about this project as it’s really just batch-style analysis so I could do complex things with software after selecting data from a DB) so I am also considering a NoSQL database like MongoDB.
Can anyone advise me on whether MySQL or MongoDB would be a better choice? If MySQL, which storage engine? If neither, do you have a better suggestion? Also, if the number of tables jumps from 10,000 to 500,000 or more, does that change your answer?
I would like to suggest a new DBMS called SciDB.org. They claim it isn’t a typical DBMS because it focuses more on scientific analytical processing. It is optimized specifically for time series data and can further be optimized to run on the cloud.
It is optimized for time series data because it stores the data in columns as opposed to rows allowing for fast access in the time dimension.
You should check it out.
I used it to analyze data that was sampling at 2000 samples per second over months of time.