Foreword: Although I don’t think it’s exactly a duplicate, feel free to close if you think this is too similiar to this previous question from me.
I am re-factoring a database design where I have four superclass tables from which some others must derive. Now, I am facing the doubt as to if should include (four) “type-identifying” tables and join them to each of the superclasses, so as to identify the sub-type of each record. The problem is that, without them, the design is already pretty big (14 tables) and since one of the requirements is that is must be easily expansible, I fear ending up with a 30-or-more table design. In short… can/may this type of tables be left out of the design?
PS: The goal is to have a highly and easily extensible design. For example, one of the tables represents a message, whose sub-types can be an SMS, an MMS, an e-mail, a twit, a post on Facebook and so on. Of course, common information goes on the superclass, and the remaining information goes into the other several tables as needed.
It’s easier to understand 30 tables than 30,000 lines of code. I’ve worked with databases that have over 100 tables in them. I wouldn’t worry about 30.
The design of tables to capture entities grouped into a superclass and several subclasses is an example of the gen-spec design pattern. Gen-spec is familiar to object oriented programmers via class inheritance. But the design of relational tables to reflect the gen-spec pattern is often omitted from introductory texts to database design.
Fortunately, it’s well understood. A web search on “generalization specialization relational modeling” will yield lots of articles on the subject, including several previous SO discussions.
As you stated, data common to all the specialized entities goes in the general (superclass) table, while data peculiar to a given specialized entity goes in the appropriate specialized (subclass) table.
The trick in the design is the way the subclass tables get a primary key. The primary key for the subclass tables is not an autoincremented number. It’s a copy of the PK from the superclass table. This makes it real easy to get all the data regarding a given specialty by just doing a join. It also makes it unnecessary to include type fields, since each specialized table covers its own subclass.
This is a little difficult to set up and to update, but it pays for itself at retrieval time.