I have a fairly simple question about natural/surrogate key usage in a well-defined context which manifests itself often, and that i’m going to illustrate.
Let’s assume you are designing the DB schema for a product using SQL Server 2005 as DBMS. For the sake of simplicity let’s say there are only two entities involved, which have been mapped to 2 tables, Master and Slave.
Assume that:
- We can have 0..n Slave entries for a single Master’s row;
- Column set (A, B, C, D) in Master is the only candidate for primary key;
- Column B in Master is subject to changes over time;
- A, B, C, D are a mix of varchar, decimal and bigint columns.
The question is: how would you design keys/constraints/references for those tables?
Would you rather (argumenting your choice):
- Implement a composite natural key on Master on (A, B, C, D), and a related composite foreign key on Slave, or
- Introduce a surrogate key K on Master, let say an IDENTITY(1,1) column with a related (single column) foreign key on Slave, adding a UNIQUE constraint on Master’s (A, B, C, D), or
- Use a different approach.
As for me I’d go with option 2), mainly because of assumption 3) and performance-wise, but I’d like to hear someone else’s opinion (since there is quite an open debate on the topic).
Either 1,2 or 3. There isn’t necessarily enough information to determine whether a surrogate is necessary or how useful it might be. Are any of the compound key attributes also part of some key or constraint in the Slave table? Is there some other key of Master that could be used as a foreign key? The fact that a key value may change shouldn’t be the deciding factor because any key value may need to change – surrogates are no exception.
Unfortunately, much of that debate is based on the mistaken assumption that you need to choose between either a surrogate or a natural key. As your option 2 rightly suggests you can use both as the need arises. One is not a substitute for the other because simple keys and compound keys on different attributes obviously mean different things in your data model and enforce different constraints on your data.