This a very basic database design/normalisation question.
Suppose I have a Books table with the following columns:
isbn|title|author|status
and status can be one of checked out, available, overdue, lost (stored as integers).
When adding rows I decide “actually, when the status is checked out, I want to store another field due_date“. I only want to store this field for books with status checked out, as it has no meaning otherwise.
What is the standard, correct, canonical way to do this?
One approach is to add the column and set it to NULL if the status is not checked out, but this sounds like a bad idea to me (for integrity among other things, e.g. what if the status is available and we also have a due_date?)
The other obvious answer is to create a DueDates table and store isbn|due_date pairs in it. This is the approach I normally take but it’s easy to end up with tables and JOINs all over the place.
I am not looking for how to store books specifically, that’s just an example of the problem and I want to know the standard solution.
Edit: Does the answer change if I decide that I want to add lots of fields for checked out status only (due_date, borrowed_by, checked_out_from, …) – and have all these as NULL if the status is not checked out?
The problem as you have stated it is fundamentally one of typing and subtyping. a “checked out book” is a type of book. an “available book” is a different type of book. A book can progress from state to state over time, and can thus belong in one subtype or another over time.
In Object modeling, this kind of issue is handled through classes, subclasses, and inheritance.
In ER modeling, this kind of issue is called “specialization”. You can find articles on the web dealing with ER specialization. I have not seen examples that deal with time varying specialization. More of the examples are time invarying, like the Pets case.
In relational modeling and relational database design, there are several standard ways of building tables to implement specialization.
The first standard way is called “Single Table Inheritance”. This is basically what you’ve designed. You end up with a lot of NULLs for data that does not pertain to the subtype of a given row. But you don’t have to do any joins.
A second standard way is called “Class Table Inheritance”. In this way, there is a separate table for each class and subclass, and they have a shared primary key. You can look up both “Class Table Inheritance” and “Shared Primary Key” in SO and on the web. You do more joining, but you have fewer NULLS.
There are other ways.
Which way is best depends on the case at hand.