I am using EntityFramework 4 to access a SQL Server 2008 database.
One of the SQL queries that the EF generates is having a behavior that I cannot explain.
The query is like this:
SELECT tableA.field1, tableA.field2, ...
FROM tableA join tableB on tableA.field1 = tableB.field1
WHERE
tableA.field2 > '20110825'
and tableA.field3 in ('a', 'b', 'c,')
and tableB.field4 = 'xxx'
Where tableA.field2 is datetime not null, and the other fields are varchars.
tableA contains circa 1.5 million records, tableB contains circa 2 million records, and the query returns 1877 rows.
The problem is, it returns them in 86 seconds, and that time changes dramatically when I change the ‘20110825’ literal to older values.
For instance if I put ‘20110725’ the query returns 3483 rows in 35 milliseconds.
I found out in the execution plan that the difference between the two lies in the indexes SQL Server chooses to use depending on the date used to compare.
When it is taking time, the execution plan shows:
- 50%: index seek on tableA.field2 (it’s a clustered index on this field alone)
- 50%: index seek on tableB.field1 (non-unique, non-clustered index on this field alone)
- 0%: join
When it is almost instantaneous, the execution plan shows:
- 98%: index seek on tableA.field1 (non-unique, non-clustered index on this field alone)
- 2%: index seek on tableB.field1 (non-unique, non-clustered index on this field alone)
- 0%: join
So it seems to me that the decision of the optimizer to use the clustered index on tableA.field2 is not optimal.
Is there a flaw in the database design? In the SQL query?
Can I force in any way the database to use the correct execution plan?
Given that you are using literal values and are only encountering the issue with recent date strings I would suspect you are hitting the issue described here and need to schedule a job to update your statistics.
Presumably when they were last updated there were few or no rows meeting the
'20110825'criteria and SQL Server is using a join strategy predicated on that assumption.