If we had a table like this:
Books (pretend “ISBN” doesn’t exist)
- Author
- Title
- Edition
- Year of publication
- Price
One could argue that {Author,Title,Edition} could be a candidate/primary key.
What determines whether the candidate/primary key should be {Author,Title,Edition} or whether an ID column should be used, with {Author,Title,Edition} a unique index/key constraint?
So is
- Author (PK)
- Title (PK)
- Edition (PK)
- Year of publication
- Price
better, or:
- ID (PK)
- Author
- Title
- Edition
- Year of publication
- Price
where {Author,Title,Edition} is an additional unique index/constraint?
Say that
{Author, Title, Edition}uniquely identifies a book, then the following holds:It is a superkey — uniquely identifies a tuple (row).
It is irreducible — removing any of the columns does not make it a key any more.
It is a candidate key — an irreducible superkey is a candidate key.
Now let’s consider the ID (integer)
I can reason that the
Booktable key will show up in few other tables as a foreign key and also in few indexes. So, it will take quite a bit of space — say three columns x 40 characters (or whatever…) — in each of these tables plus in matching indexes.In order to make these “other” tables and indexes smaller, I can add a unique-integer-column to the
Booktable to be used as a key which will be referenced as a foreign key. Say something like:With
BookIDbeing (must be) unique too, theBooktable now has two candidate keys.Now I can select the
BookIDas a primary key.However, the
{Author,Title,Edition}must stay a key (unique) in order to prevent something like this:To sum it up, adding the
BookID— and choosing it as the primary — did not stop{Author, Title, Edition}being a (candidate) key. It still must have its own unique constraint and usually the matching index.Also note that from the design point, this decision was done on the “physical level”. In general, on the logical level of design, this
IDdoes not exists — it got introduced during the consideration of column sizes and indexes. So, the physical schema was derived from the logical one. Depending on the DB size, RDBMS and hardware used, none of that size-reasoning may have measurable effect — so using{Author, Title, Edition}as a PK may be perfectly good design — until proven differently.