I am in the process of creating a second version of my technical wiki site and one of the things I want to improve is the database design. The problem (or so I think) is that to display each document, I need to join upwards of 15 tables. I have a bunch of lookup tables that contain descriptive data associated with each wiki entry such as programmer used, cpu, tags, peripherals, PCB layout software, difficulty level, etc.
Here is an example of the layout:
doc
--------------
id | author_id | doc_type_id .....
1 | 8 | 1
2 | 11 | 3
3 | 13 | 3
_
lookup_programmer
--------------
doc_id | programmer_id
1 | 1
1 | 3
2 | 2
_
programmer
--------------
programmer_id | programmer
1 | USBtinyISP
2 | PICkit
3 | .....
Since some doc IDs may have multiples entries for a single attribute (such as programmer), I have created the DB to compensate for this. The other 10 attributes have a similiar layout as the 2 programmer tables above. To display a single document article, approx 20 tables are joined.
I used the Sphinx Search engine for finding articles with certain characteristics. Essentially Sphinx indexes all of the data (does not store) and returns the wiki doc ID of interest based on the filters presented. If I want to find articles that use a certain programmer and then sort by date, MYSQL has to first join ALL documents with the 2 programmer tables, then filter, and finally sort the remaining by insert time. No index can help me ordering the filtered results (takes a LONG time with 150k doc IDs) since it is done in a temporary table. As you can imagine, it gets worse really quickly with the more parameters that need to be filtered.
It is because I have to rely on Sphinx to return – say all wiki entries that use a certain CPU AND programer – that lead me to believe that there is a DB smell with my current setup….
edit: Looks like I have implemented a [Entity–attribute–value model]1
I don’t see anything here that suggests you’ve implemented EAV. Instead, it looks like you’ve assigned every row in every table an ID number. That’s a guaranteed way to increase the number of joins, and it has nothing to do with normalization. (There is no “I’ve now added an id number” normal form.)
Pick one lookup table. (I’ll use “programmer” in my example.) Don’t build it like this.
Instead, build it like this.
And in the tables that reference it, consider cascading updates and deletes.
What have you gained? You keep all the data integrity that foreign key references give you, your rows are more readable, and you’ve eliminated a join. Build all your “lookup” tables that way, and you eliminate one join per lookup table. (And unless you have many millions of rows, you’re probably not likely to see any degradation in performance.)