I’m writing a database front-end that so far relies on a data access layer between the client UI and the database. As an SQL noob (and since this isn’t a high-security application) I’m happy with the DAL as an alternative to writing all the database logic in stored procedures.
My concern at the moment is speed. I understand that SQL Server will compile stored procedures and execute them a lot faster than random queries, but I can’t find any info on how much of an impact this actually has on performance.
I know that if I were designing SQL Server, I would at least consider transparently caching common queries, as well as dynamically storing and compiling procedures as needed to accomodate common queries. I imagine that this would largely eliminate the need for stored procedures in many scenarios (except for encapsulating the database, obviously.)
But at the moment my database is pretty small so I can’t really tell how well it’s performing. Am I on the wrong path here? Will I end up painfully migrating all the logic to stored procedures once the database grows to millions of records and I discover the error of my ways?
On a related note, how does the performance of SELECT scale with table size? I can’t seem to find any resources on this, but it seems crucial to deciding how to lay out my database. Do I shove all my entries into one table, say, and rely on WHERE to isolate them in logical groups, or do I group entries into separate tables because SELECTing 100 out of a million rows is horribly slow?
If you use properly parametrized queries (with parameters like
@CustomerIDinstead of concatenating together your SQL statement), there shouldn’t be much difference between stored procedures and plain SQL statements. Both the execution plans for plain queries as well as for stored procedures are cached by SQL Server, and if they’re re-used frequently, they won’t be tossed out of the cached too quickly. So strictly from a performance perspective, stored procedures don’t really give you much of a benefit.Stored procedures can be beneficial since they can provide another layer of security – if all your data access goes via the stored procedures, the “regular” users don’t typically need even read permissions on the base tables.
As for table size: if you have proper indexing in place, a simple Index Seek operation will take about 3-6 page reads in SQL Server to get to the leaf level – and that’s for up to several million data pages. The point is : proper indexing and that’s not always easy and obvious to get right.
One of the most important aspects in SQL Server is getting the clustering index right, since that defines the phsyical order of your data, and the clustering key is the most replicated / most redundant data piece in your server – so you want to make that as efficient as possible.
Check out Kimberly Tripp’s outstanding blog post More Considerations For The Clustering Key and also study her other blog posts she links to to get a good understanding of what makes a good clustering key – this is absolutely crucial to get right!
Kimberly – the Queen of Indexing on SQL Server – also has a ton of great blog post on how to choose good nonclustered indices, and when less is more (don’t over-index your database either – that could be even worse than no indexes at all)