Let’s say I have the following table:
Table: RelationshipType
============================================================
| ID (PK) | ParentID | ChildID | RelationshipType |
============================================================
There are mostly cases where ParentID and ChildID are selected on individually:
... WHERE ParentID = @SomeID
... WHERE ChildID = @SomeID
And sometimes both are selected on:
... WHERE ParentID = @SomeID AND ChildID = @SomeOtherID
I want to increase the performance of these queries but most notably the first two. Should I create a non-clustered index on ParentID + ChildID together or one index on ParentID and another index on ChildID?
EDIT: All of these queries are highly selective (1 or 2 records returned).
Can you get rid of the surrogate key
ID?If yes, consider creating the following:
{ParentID, ChildID}.{ChildID, ParentID}, but include theRelationshipTypein the index as well (use the INCLUDE keyword).This way, you have a covering index in all 3 cases, so you don’t have to pay the price of the double-lookup (that is normally required for secondary indexes in clustered tables):
... WHERE ParentID = @SomeIDcan be satisfied by a simple seek in the B-Tree of the index:{ParentID, ChildID}. The value ofChildIDandRelationshipType1 can be retrieved directly from the found leaf of this B-Tree.... WHERE ChildID = @SomeIDcan be satisfied by a simple seek in the B-Tree of the index:{ChildID, ParentID}. The value ofParentIDandRelationshipType2 can be retrieved directly from the found leaf of this B-Tree.... WHERE ParentID = @SomeID AND ChildID = @SomeOtherIDcan be satisfied by either.1 The clustering key is the “main” B-Tree for the table and includes all columns, not just those that are unique.
2 Thanks to
INCLUDE (RelationshipType).Doing something similar with the
IDpresent is possible, but would require 3 indexes instead of 2 and all of them would be fatter to achieve covering. You’d have to measure to make sure, but my feeling is that this would be more trouble than it’s worth.Otherwise, don’t use clustering at all. Just create normal indexes on:
{ID}– regular, non-clustering primary index (use the NONCLUSTERED keyword).{ParentID}– regular secondary index.{ChildID}– regular secondary index.You’ll have a normal heap table, so each access will require an index seek + (usually) table heap access, but your indexes will be kept slim, raising the cache effectiveness.
... WHERE ParentID = @SomeID AND ChildID = @SomeOtherIDwould require two index seeks (or possibly a seek on either{ParentID}or{ChildID}index + table heap access), but this is still pretty fast and is not too frequent (as you stated).Please do measure on realistic amounts of data before deciding either way.