I rarely come across a situation where a single table for multiple entity types seems better than one table per entity type. Here’s an example that makes sense to me, but academically it seems wrong.
QUESTION: Can I do this and still have a “sound” architecture?
Example Follows
Suppose two entity types, a corporation and a person. A corporation is typically owned by a person, but sometimes another corporation owns a corporation.
Holding onto that thought, and adding to it, let’s say that every corporation has a registered agent attached to it who is responsible for the legal creation of the corporation. Furthering my illustration, the registered agent can be either a person or a corporation.
If you consider that the owner [parent] to the corporation [child] can be either a person or a corporation, you may begin to see the challenge in keeping the third normal form and avoiding redundancy.
In contrast to my example, if only people could own Corporations, the Ownership link table is very conventional, having columns: OwnershipID (sort of unecessary), CorporationID, PersonID.
Instead, you need something like: OwnershipID, CorporationID, OwnerID, OwnerEntityType (corp or person). Don’t get me wrong, you can make this work, but it won’t be fun, to say the least.
Continuing on with the example I gave, you need to assign an agent to every Corporation. Usually, the agent is one of the owners (a person). In which case, you really do want to link back to the one record of that person. You don’t want to have record of the person as an owner and then again as an agent (in an Agent table). That would be redundant.
Similarly to that “problem” a registered agent can also be a corporation, such as a law firm, a CPA, or a biz filings company, to name some typical examples. Just like the agent-person, an agent-corporation really should not get its own record as an agent-entity. Instead it needs to link back to the of its corporate existence in the Corporation table. [except that I’m ultimately saying to not have a CorporationEntity table]
Just like the link table that matched each corporation to its owner(s) of any type, person or corporation, you could have an agent link table of: AgentRepresentationID, CorporationID, AgentID, AgentType… but again, it would be ugly (IMO) when you have to pull together the related agents — some from the Person table, some from the Corporation table.
This is why I say, in cases like this, you can see how a neutral entity type can be advantageous. It would be something like this:
Table: EntityAll
Key Columns: EntityID, EntityType (or EntityTypeID if you insist, link out to get the description), EntityName (there are concerns with names and different types… off topic to this post)
Link Table: CorporationOwnership
Key Columns: OwnershipID (again, my comment that this is kind of unecessary), ChildEntityID (the entity being owned; named “Child” for clarity, I wouldn’t name it that) ParentEntityID (the parent entity)
Link Table: AgentRepresentation
Key Columns: AgentRepresentationID (…I won’t say it), CorporationEntityID (the corp entity being represented), AgentEntityID (from the Entity table, equating to the record that’s the agent here)
While you might be OK with my architecture, you should be a little bothered by the column naming in the link tables. It bothers me. Typically the second and third column names in those tables match exactly the name of the columns you JOIN in each entity’s respective table (haha, but each entity doesn’t have a respective table so you can’t have the link table column names match the source column names because they are THE same column). Technically this does not matter, but it will break your naming conventions which does matter, but not enough to not do it.
In case I haven’t driven it home well enough yet, here is how you’ll pull it together. You JOIN the EntityAll table on its own self to get what you need.
List all Corps and their owners (in T-SQL):
SELECT Corp.EntityName as CorpName, Owner.EntityName as OwnerName
FROM EntityAll as Corp
JOIN CorporationOwnership as Link on (Corp.EntityID = Link.ChildEntityID)
JOIN EntityAll as Owner on (Link.ParentEntityID = Owner.EntityID)
Consequently, you’d do the same thing to return the agent, instead of the owner(s).
I realize this is not how we’re trained to architect tables, but I feel pretty strongly that my solution eliminates redundant data AND makes it easier to code, manage and read.
P.S. I recently provided this example as an answer to an old question on SO. Being an old question, there was no dialogue. I need to implement this very example, and I’m curious about the ramifications of this architecture.
Here is the previous question/answer:
Good db table design: One table mixing different entities or separate table for each entity
I think it was Hugh Darwen who coined the terms ‘distributed key’ and ‘distributed foreign keys’, where a single referenced key value exists in exactly one of multiple referencing relvars (tables); this would require a related concept of ‘multple assignment’ in order to atomically insert to both the referenced and referencing relvars.
While this could in theory be achieved in SQL-92 using deferrable schema-level
ASSERTIONs (or perhapsCHECKconstraints that support subqueries), it’s rather a clunky process, is procedural (rather than set-based) and there isn’t a SQL product that has ever support this functionality (or ever will, I susepct).The best we can do with available SQL products is to use a compound key
(entity_ID, entity_type)with aCHECKconstraint on theentity_typein referencing tables to ensure there is no more than one referencing key value (note this is not the same as ‘exactly one referencing key value’) e.g.This superclass-subclass pattern is very common in SQL.
Ideally, a table name should reflect the nature of the set as a whole. You many need to think beyond a compound of other sets’ names; perhaps ask a expert in the particular field of business e.g. an accountant may use the term ‘payroll’ rather than ‘EmployeesSalaries’.
Another ideal is for a column’s name to remain the same throughout the schema but with a subclassing approach you often need to qualify them (and this bothers me!) e.g.
Note I would have used a single column key for
agent_person_IDe.g.because there is no restriction on entity type. In principle I feel better about retaining the two-column compound key for all references throughout the schema and I find in practice as often as not I need to now the entity type anyhow so this SQL DDL is saving a
JOINin SQL DML 🙂