I’m using sql-server-2008-R2. I have a table that has three types of data in it, and the types are in another table.
T = Table , F = Field , FK = Foreign Key , PK = Primary Key
T1: F1(PK), F2(TypeID), F3, F4, F5, F6, F7
T2: F1(TypeID, PK), F2(TypeName)
I want to add a fourth type but this type has an additional property(e.g. TypeRate).
My T1 table will have at least 3 million records on the first week of project start, and then it will slow that to something about 3 million records per month.
NOW I want to know which method is the best one among the ones listed below:
A. Add a field to the main table (T1):
T1: F1(PK), F2(TypeID), F3, F4, F5, F6, F7, F8(TypeRate)
F8 will be null most of the time (for records of other types), but I’ll have just one table
B. Add another table at all with all the fields that T1 has (T3):
T3: F1(PK), F2(TypeID), F3, F4, F5, F6, F7, F8
so that T1 does not have a null value most of the time, but I’ll have two tables which are mostly alike.
C. Add a description table (T4):
T4: F1(PK), F2(FK:T1.PK), F3(TypeRate)
so that my T1 table does not have null value, and for the records of the fourth type the additional data are in T4 (the description table)
You cannot ask for the “best” solution without describing what you are trying to accomplish. Well, I guess you can ask, but it makes the question impossible to answer.
If you are trying to minimize space (memory and disk space), then splitting the table up into two — as implied by option (b) — would be the minimum space solution. However, it is very, very unlikely that you would choose this option. The gain in space efficiency is minimal and the splitting an entity into two tables is generally not the best solution.
The first solution incurs about a bit over overhead per row for each NULL — a pretty trivial amount of space. This seems like a good solution in many cases. The data is available without an additional join.
The third solution is also fine. It requires an additional join to fetch the data. But, if the reference table is small or if you build an index on the key, then the performance overhead should be negligible.
There is another solution, which I will call (d). That is to have another table with the same primary key as the first table along with the additional columns. This can be useful when there are multiple different columns that form a natural grouping.
In short, as a general rule, I would go with (c). It maintains the relational integrity of the databases with minimal performance hit. There may be some cases where I would go with (a) or (d), but that depends on the problem and what is considered “best”.