I have an SQL Query (For SQL Server 2008 R2) that takes a very long time to complete. I was wondering if there was a better way of doing it?
SELECT @count = COUNT(Name)
FROM Table1 t
WHERE t.Name = @name AND t.Code NOT IN (SELECT Code FROM ExcludedCodes)
Table1 has around 90Million rows in it and is indexed by Name and Code.
ExcludedCodes only has around 30 rows in it.
This query is in a stored procedure and gets called around 40k times, the total time it takes the procedure to finish is 27 minutes.. I believe this is my biggest bottleneck because of the massive amount of rows it queries against and the number of times it does it.
So if you know of a good way to optimize this it would be greatly appreciated! If it cannot be optimized then I guess im stuck with 27 min…
EDIT
I changed the NOT IN to NOT EXISTS and it cut the time down to 10:59, so that alone is a massive gain on my part. I am still going to attempt to do the group by statement as suggested below but that will require a complete rewrite of the stored procedure and might take some time… (as I said before, im not the best at SQL but it is starting to grow on me. ^^)
In addition to workarounds to get the query itself to respond faster, have you considered maintaining a column in the table that tells whether it is in this set or not? It requires a lot of maintenance but if the
ExcludedCodestable does not change often, it might be better to do that maintenance. For example you could add a BIT column:Make it NOT NULL and default to 0. Then you could create a filtered index:
Now you just have to update the table once:
And ongoing you’d have to maintain this with triggers on both tables. With this in place, your query becomes:
EDIT
As for “NOT IN being slower than LEFT JOIN” here is a simple test I performed on only a few thousand rows:
EDIT 2
I’m not sure why this query wouldn’t do what you’re after, and be far more efficient than your 40K loop:
Or the LEFT JOIN equivalent:
I would put money on either of those queries taking less than 27 minutes. I would even suggest that running both queries sequentially will be far faster than your one query that takes 27 minutes.
Finally, you might consider an indexed view. I don’t know your table structure and whether your violate any of the restrictions but it is worth investigating IMHO.