An index on two columns can be created with either of the statements
create index foo_ix on foo(a,b);
create index foo_ix on foo(b,a);
-
How does this affect the operational (runtime) characteristics of using the index?
-
How does this affect the layout (physical) characteristics of the index?
-
Are either (1) or (2) affected by the types/sizes of the columns?
-
What are the best practices for creating multi-column indexes?
In short, does it matter which column I put first?
aandbboth have 1000 distinct values and they are always queried together then the order of columns in the index doesn’t really matter. But ifahas only 10 distinct values or you have queries which use just one of the columns then it does matter; in these scenarios the index may not be used if the column ordering does not suit the query.The one potential exception to 2. and 3. is with DATE columns. Because Oracle DATE columns include a time element they might have 86400 distinct values per day. However most queries on a data column are usually only interested in the day element, so you might want to consider only the number of distinct days in your calculations. Although I suspect it won’t affect the relative selectivity in but a handful of cases.
edit (in response to Nick Pierpoint’s comment)
The two main reasons for leading with the least selective column are
Both these work their magic from knowing that the value in the current slot is the same as the value in the previous slot. Consequently we can maximize the return from these techniques by minimsing the number of times the value changes. In the following example,
Ahas four distinct values andBhas six. The dittos represent a compressible value or a skippable index block.Most selective column leads …
Even in this trival example,
(A, B)has 20 skippable slots compared to the 18 of(B, A). A wider disparity would generate greater ROI on index compression or better utility from Index Skip reads.As is the case with most tuning heuristics we need to benchmark using actual values and realistic volumes. This is definitely a scenario where data skew could have a dramatic impact of the effectiveness of different approaches.
If we have a highly selective column then we should build it an index of its own. The additional benefits of avoiding a FILTER operation on a handful of rows is unlikely to be outweighed by the overhead of maintaining a composite index.
Multi-column indexes are most useful when we have: