I am looking for a scheme-less database to store roughly 10[TB] of data on disk, ideally, using a python client. The suggested solution should be free for commercial use, and have good performance for reads and writes.
The main goal here is to store time-series data, including more than billion records, accessed by time stamp.
Data would be stored in the following scheme:
KEY –> “FIELD_NAME.YYYYMMDD.HHMMSS”
VALUE –> [v1, v2, v3, v4, v5, v6] (v1..v6 are just floats)
For instance, suppose that:
FIELD_NAME = “TOMATO”
TIME_STAMP = “20060316.184356”
VALUES = [72.34, -22.83, -0.938, 0.265, -2047.23]
I need to be able to retrieve VALUE (the entire array) given the combination of FIELD_NAME & TIME_STAMP.
The query VALUES[“TOMATO.20060316.184356“] would return the vector [72.34, -22.83, -0.938, 0.265, -2047.23]. Reads of arrays should be as fast as possible.
Yet, I also need a way to store (in-place) a scalar value within an array . Suppose that I want to assign the 1st element of TOMATO on timestamp 2006/03/16.18:43:56 to be 500.867. In such a case, I need to have a fast mechanism to do so — something like:
VALUES[“TOMATO.20060316.184356“][0] = 500.867 (this would update on disk)
Can something like MangoDB work? I will be using just one machine (no need for replication etc), running Linux.
CLARIFICATION: only one machine will be used to store the database. Yet, I need a solution that will allow multiple machines to connect to the same database and update/insert/read/write data to/from it.
MongoDB is likely a good choice related to performance, flexibility and usability (easy approachable). However large databases require careful planning – especially when it comes to aspects of backup and high-availability. Without further insight about project requirements there is little to say if one machine is enough or not (look at replica sets and sharding if you need options scale).
Update: based on your new information – should be doable with MongoDB (test and evaluate it). Easiliy spoken: MongoDB can be the “MySQL” of the NoSQL databases….if you know about SQL databases then you should be able to work with MongoDB easily since it borrows a lot of ideas and concept from the SQL world. Looking at your data model…it’s trivial and data can be easily retrieved and stored (not going into details)..I suggest download MongoDB and walking through the tutorial.