I am facing an application designed to import huge amounts of data into a Microsoft SQL Server 2000 database. The application seems to take an awful long time to complete and I suspect the application design is flawed. Someone asked me to dig into the application to find and fix serious bottlenecks, if any. I would like a structured approach to this job and have decided to prepare a checklist of potential problems to look for. I have some experience with SQL databases and have so far written down some things to look for.
But it would be very helpful with some outside inspiration as well. Can any of you point me to some good resources on checklists for good database schema design and good database application design?
I plan on developing checklists for the following main topics:
- Database hardware – First thing is to establish proof that the server hardware is appropriate?
- Database configuration – Next step is to ensure the database is configured for optimal performance?
- Database schema – Does the database schema have a sound design?
- Database application – does the application incorporate sound algorithms?
Good start. Here are the recommended priorities.
First Principle. Import should do little or no processing other than source file reads and SQL Inserts. Other processing must be done prior to the load.
Application Design is #1. Do the apps do as much as possible on the flat files before attempting to load? This is the secret sauce in large data warehouse loads: prepare offline and then bulk load the rows.
Database Schema is #2. Do you have the right tables and the right indexes? A load doesn’t require any indexes. Mostly you want to drop and rebuild the indexes.
A load had best not require any triggers. All that triggered processing can be done off-line to prepare the file for a load.
A load had best not be done as a stored procedure. You want to be using a simple utility program from Microsoft to bulk load rows.
Configuration. Matters, but much, much less than schema design and application design.
Hardware. Unless you have money to burn, you’re not going far here. If — after everything else — you can prove that hardware is the bottleneck, then spend money.