I have the following query:
SELECT MAX([LastModifiedTime]) FROM Workflow
There are approximately 400M rows in the Workflow table. There is an index on the LastModifiedTime column as follows:
CREATE NONCLUSTERED INDEX [IX_Workflow_LastModifiedTime] ON [dbo].[Workflow]
(
[LastModifiedTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100)
The above query takes 1.5 minutes to execute. Why wouldn’t SQL Server use the above index and simply retrieve the last row in the index to get the maximum value?
BTW, the query plan for this query shows an index scan being done on the above index.
Thanks.
Mysterious are the ways of the query optimizer…
If is possible, I’d recommend you change the query like this:
This is semantically identical and the optimizer will no longer consider using the MAX aggregate and scan (which apparently it does right now). It may consider doing a SORT in a worktable, but hopefully the estimated cost of such a plan would be much bigger than the cost of the reverse order seek.
As to why does the optimizer choose what apparently is an obviously bad plan, there are usually many many factors involved and is hard to diagnose just from a SO post. In general, having an ASC index does not always substitute for the lack of a DESC index and your particular column statistics (distribution) may had hit some tipping point inside the query optimizer where it decided to choose the scan+aggregate instead of the reverse scan+top.