Suppose I have simple table like this:
ID (PRIMARY)
time (INT)
stage (TINYINT)
other fields...
I have to do range search on time while normally selecting stage. With SQL query example:
SELECT * FROM table WHERE time>10000 AND (stage=1 OR stage=3 OR stage=4)
VERY IMPORTANT: There are a lot of rows with stage = 2, let’s say 99% of the table. There are only 5 distinct stage values.
What would be the propper indexing of this table?
It depends on the distribution of the values in the different columns.
If you have very few possible
stagevalues you will probably have the best performance with either a separate index ontimeand one onstageor with a combined indextime, stage.But if you have lots of distinct
stagevalues it might be faster to order the index the other way around:stage, time.But using
ORmakes thestagesearch more fragmented compared to anANDsearch. Therefore I would try to havetimefirst in the index.The only way to know for sure on your specific set of data is to try and measure, but above three mentioned candidates are my top candidates for indexes.
Edit
You might want to create a clustered index on
time, possiblytime, stageif most of your queries search by time range. This way you minimize lookup in the table once you found the correct rows in the index.Beware that this can create a fragmented dataspace if
timeis strictly increasing when you insert new records.