I have a multitenancy application and I want a clustered index for the data to support fast range queries.
If I design my clustered index like this:
(SystemID, EntityID, IsHidden)
SystemID is the unique identifier for the multitenancy instance, EntityID is an identity for the entity and IsHidden is a flag whether this row shows up in results or not. Will SQL Server be able to throw out all data not belonging to the system as well as the hide data efficiently? and does the order in which these columns are specified matter?
If I have a query like so:
SELECT * FROM MyTable WHERE SystemID = @pSystemID AND IsHidden = 0
I guess what I’m trying to do is effectively partition the table so that all rows belonging to a specific system as well as hidden data is physically grouped close together. That way, it can be easily discarded depending on the query against that data.
Is this good or bad? (I’m leaning towards good, I’m not expecting a lot of inserts to be taking place)
Make it like this instead:
(SystemID, IsHidden, EntityID). Having theIsHiddencolumn after theEntityIDwould make it basically useless since theEntityIDis already unique. Searching for the criteria you give as example (WHERE SystemID=@SystemID AND IsHidden=0) would still have to search the entire range of that tenant, since the rows withIsHidden=0are spread out across the entire range. Moving this column beforeEntityIDallows for much more efficient range scans.One problem you’ll face is that searching for a specific
EntityIDwill be by default inefficient (WHERE EntityID=@EntityID). You can improve things by adding a non-clustered index onEntityIDbut that will only solve part of the problems. The bulk of the issues will arise from joins with other tables, like a details table that will join on condition:As these queries get more complex and the range of candidate rows increases, the efficiency of the non-clustered indexes on the
EntityID/ParentEntityIDkeys starts to decrease, until they hit the tipping point and are basically ignored. If possible, make sure all these joins specify the clustered index key instead:The problem will be that most modeling tools (like EF or Linq) will tend to join by the logical primary key (the
EntityID) as opposed to the physical clustered key.