I’ve got a database in production for nearly 3 years, on Sql 2008 (was ’05, before that). Has been fine, but it isn’t very performant. So i’m tweaking the schema and queries to help speed some things up. Also, a score of main tables contain around 1-3 mill rows, per table (to give u a estimate on sizes).
Here’s a sample database diagram (Soz, under NDA so i can’t display the original) :-
alt text http://img11.imageshack.us/img11/4608/dbschemaexample.png
Things to note (which are directly related to my problem) :-
- A vehicle can have 0 (NULL) or 1 Radio. (Left Outer Join)
- A vehicle can have 0 (NULL) or 1 Cupholder (Left Outer Join)
- A vehicle has 1 Tyre Type (Inner Join).
Firstly, this looks like a normalised database schema. I suck and DB theory, so I’m guessing this is 3NF (at least) … famous last words 🙂
Now, this is killing my database performance because these two outer joins and inner join are getting called a lot AND there’s also a few more joins in many statements.
To try and fix this, I thought I might try and indexed view. Creating the view is a piece of cake. But indexing it, doesn’t work -> can’t create indexed views with joins OR self referencing tables (also another prob 🙁 ).
So, i’ve cried for hours (and /wrists, dyed hair and wrote an emo song about it and put it on myfailspace) and did the following…
- Added a new row into each ‘optional’ outer join tables (in this example, Radios and CupHolders). ID = 0, rest of the data = ‘Unknown Blah’ or 0’s.
- Update Parent tables, so that any NULL data’s now have a 0.
- Update relationship from outer joins to inner joins.
Now, this works. I can even make my indexed view, which is very fast now.
So … i’m in pain. This just goes against everything I’ve been taught. I feel dirty. Alone. Infected.
Is this a bad thing to do? Is this a common scenario of denormalizing a database for the sake of performance?
I would love some thoughts on this, please 🙂
PS. Those images a random google finds — so not me.
Database should always be designed and initially implemented in 3NF. But the world is a place of reality, not ideals, and it’s okay to revert to 2NF (or even 1NF) for performance reasons. Don’t beat yourself up about it, pragmatism beats dogmatism in the real world all the time.
Your solution, if it improves performance, is a good one. The idea of having an actual radio (for example), manufactured by nobody and having no features, is not a bad one – it’s been done a lot before, believe me 🙂 The only reason you would use that field as NULL was to see which vehicles have no radio and there’s little difference between these queries:
My first thought was to simply combine the four tables into one and hang the duplicate data issue. Most problems with DBMS’ stem from poor performance rather than low storage space.
Maybe keep that as your fallback position if your current de-normalized schema becomes slow as well.