I’m trying to come up with a relational model and database implementation, but keep running into this problem. But I don’t even know what to call it! Suggestions for improving the title would be appreciated.
I’ve tried to boil the problem down to its basics.
Simplified example:
Here’s the MySQLWorkbench diagram:

As you can see, the problems are all with the Atom table.
Outline of desired goal:
- I need to store data for specific atoms of proteins
- I may have data for some of the atoms, or none of the atoms
- I don’t want to be able to insert junk data — I’d like for the database constraints to prevent this
What I’m not sure about:
- whether there should be an
Atomtable — it seems like a join betweenAtom_refandResiduewould generate all the atoms of the protein — but I also need to store data about the atoms
Outline of problem:
- each atom needs a residue, and an atom_ref
- but since the residue is associated with a residue_ref, the atom_ref can only be one of the associated (with the residue_ref) atom_ref’s
- don’t know how to match the residue_ref of the Residue with the residue_ref of the atom_ref
What I’ve tried so far:
- add the pk of
Atom_refto the pk ofAtom— but then the residue_ref might not match that ofResidue - change
Residue.rr_nameto part of the pk — violates domain semantics
I know this is a poor explanation of the problem, I’m trying to figure out how to explain it more clearly! Suggestions for improvement are more than welcome!
If I understand you correctly, then what you’re after is an inclusion dependency between (a) the join of Atom and Residue, and (b) Atom_ref. (i.e. all atomnames in Atom, combined with the rr_name defined for it in residue, must be declared as being valid combinations, i.e. must appear in Atomref).
The way to do it using merely RI/FK, is to include rr_name in Atom, redundantly. Extend the FK from Atom to Residue to all three columns. This will guarantee you that the rr_names recorded in Atom remain consistent with the information in Residue. But since you have now introduced rr_name in Atom, you now have the means to ensure (through an FK atomname+rrname from Atom to Atom_ref) that whatever is recorded in Atom, is also consistent with the atomnames which have been declared (In atomref) to exist for the residueref involved.
Note that this “solution” makes updating your database harder (more redundancy to maintain, giving rise to more potential for violations), because you have just lowered the NF level of your design.
The other way to do it is to leave your design as is, and enforce the constraint through appropriate triggers on every involved table where an update could cause a violation of your business rule. That would be, deletes and updates on Atom_ref (i.e. anything that causes the disappearing of a valid combination that effectively exists somewhere), updates (of rr_name) on Residue, and inserts and updates on Atom (i.e. anything that might cause the appearance of some combination that might not be valid).