I am now building an application which should store and handle large amounts of data. So now I’m struggling with the question – which DB should I use.
My requirements are:
- Handle up to ~100,000 insert commands a second (sometimes several ones from different threads). 100,000 is the peak; Most of the time the amount would be between hundreds to a few thousands.
- Store millions of records.
- Query the data as quickly as possible.
- Part of the data properties change for every entity, which fits non-relational database behavior more than relational ones. However, the sum of possible properties is not huge, so it can be presented as columns in a relational database (if it’s much faster this way).
- Update commands will rarely occur.
Which DB would you recommend me to use?
Thanks!
Update: The OS I’m using isn’t Windows. I thought that if SQL Server would be the most recommended DB then I might switch but from your responses, this is not the case.
Regarding the budget – I will start with the cheapest option and I guess that this will change once the company has more money and more users.
No one has recommended no-sql databases. Are they really that bad for this kind of requirements?
The answer depeneds on asking additional questions, such as how much you want to spend, what OS you are using, and what expertise you have in-house.
Database that I know of that can handle such a massive scale include:
DB2, Oracle, Teradata, and SQL Server. MySQL may also be an option, though I’m not sure of its performance capabilities.
There are others, I’m sure, designed for handling data on the massive scale you are suggesting, and you may need to look into those, as well.
So, if your OS is not Windows, you can exclude SQL Server.
If you are going on the cheap, MySQL may be the option.
DB2 and Oracle are both mature database systems. If your system is mainframe (IBM 370), I’d recommend DB2, but for Unix-based either may be an option.
I don’t know much about Teradata, but I know it is specifically designed for massive amounts of data, so may be closer to what you are looking for.
A more complete list of choices can be found here: http://en.wikipedia.org/wiki/List_of_relational_database_management_systems
A decent comparason of database here: http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
100000+ inserts a second is a huge number, no matter what you choose, you are looking at spending a fortune on hardware to handle this.