I am am trying to construct a database in SQL Server 2008 R2 that will allow users to place their own sub-types into categories. I have a parent table that holds the preset category names (defined by me).
The question I face is what is what is the best way to deal with the PRIMARY KEY and UNIQUE constraint, and foreign key REFERENCES. Indexing is at the center of this as I anticipate that the sub table (we will call it CategoryTypes) will grow quite large over time and will need to be able to efficiently allow reads from the data based on the parent table (Categories). Is there any problem I would need to anticipate if the tables were laid out as follows?
My concern is that the IDENTITY column in the CategoryTypes table will need to maintain a unique count. The reason I have included this field is to allow a simpler reference when passing data between tiers in the application. By passing an Integer versus an Integer / String pair. The data in these tables will persist at each layer of the database to save on bandwidth. From a database perspective, does the layout below pose any major challenges once deployed?
To simplify, is there a problem with using a unique ID field (IDENTITY) that is not included in the primary key when a composite key is present? See table layout below:
Parent Table:
CREATE TABLE schema.Categories
(
Id TINYINT PRIMARY KEY NOT NULL,
Name VARCHAR(100) NOT NULL,
)
Sub Table (User inserted data over time):
CREATE TABLE schema.CategoryTypes
(
Id INT IDENTITY(1,1) NOT NULL,
CategoryId TINYINT REFERENCES schema.Categories(Id) NOT NULL,
Name VARCHAR(100) NOT NULL,
CONSTRAINT PRIMARY KEY CLUSTERED(CategoryId, Name)
CONSTRAINT UC_CategoryTypesId UNIQUE NONCLUSTERED(Id)
)
What you are describing sounds kind of like an inheritance structure. I have created an example dataset as far as I understand it. Can you verify this is your intent?
If it is, then this should work fine, and I do not see why you are not setting the CategoryType.Id as the primary key? If it is not your PK, nor being referenced as a FK elsewhere, then I dont see a point to it. I personally dont think you gain enough in bandwidth savings, and should probably just request the data by CategoryId and Name. In fact, no PK is often how inheritance structures are represented ( How can you represent inheritance in a database? ).
If you must keep it the way that you have it set up, I personally suggest setting the Id as the PK, and just setting up a unique constraint on CategoryId/Name.
That is just my two cents, though.
UPDATED ANSWER (to directly address performance concerns)
First, I would suggest not entirely worrying about it too much if it is not a problem. That is a common problem many of us make, overcomplicating something that does not need it. That falls under the KISS principle in my book
However, if you are deadset on trying to figure this out ahead of time the way you explained, then here are my additional thoughts:
INCLUDEarticle)Ultimately, I think what you are doing will be fine, however a PK does not have to be clustered, so I would definitely move the PK to the Id field. It is your choice if you want to make the cluster on CategoryId or CategoryId/Name, or if you want to try using the INCLUDE as I suggested. This really will depend on how the tables are being used, so comparing execution plans might help here.
Hopefully this helps 🙂