I am setting up a database that I anticipate will be quite large, used for calculations and data storage. It will be one table with maybe 10 fields, containing one primary key and two foreign keys to itself. I anticipate there will be about a billion records added daily.
Each record should be quite small, and I will primarily be doing inserts. With each insert I will need to do a simple update on one or two fields of a connected record. All queries should be relatively simple.
At what size will I start running into performance problems with sql-server? I’ve seen mention of vldb systems, but also heard they may be a real pain. Is there a threshold where I should start looking at that? Is there a better db than sql-server that is designed for this sort of thing?
When talking about transaction rates of over 10k/sec you shouldn’t be asking advice on forums… This is close to TPC-C benchmark performance on 32 and 64 ways, which cost millions to tune up.
At what size will you be running into problems?
With a good data model and schema design, a properly tunned and with correct capacity planned server will not run into problems for 1 bil. records per day. The latest published SQL Server benchmarks are at about 1.2 mil tran/min. That is roughtly 16k transactions per second, at system priced at USD ~6 mil in 2005 (64 way Superdome). To achieve 10k tran/sec for your planned load you’re not going to need a Superdome, but you are going to need a quite beefy system (at least 16 way probably) and specially a very very good I/O subsytem. When doing back of the envelope capacity planning one usualy considers about 1K tran/sec per HBA and 4 CPU cores to feed the HBA. And you’re going to need quite a few database clients (application mid-tiers) just to feed 1 bil. records per day into the database. I’m not claiming that I did your capacity planning here, but I just wanted to give you a ballpark of what are we talking about. This is a multi-million dollars project, and something like this is not designed by asking advice on forums.