I’m maintaining stored procedures for SQL Server 2005 and I wish I could use a new feature in 2008 that allows the query hint: “OPTIMIZE FOR UNKNOWN”
It seems as though the following query (written for SQL Server 2005) estimates the same number of rows (i.e. selectivity) as if OPTION (OPTIMIZE FOR UNKNOWN) were specified:
CREATE PROCEDURE SwartTest(@productid INT)
AS
DECLARE @newproductid INT
SET @newproductid = @productid
SELECT ProductID
FROM Sales.SalesOrderDetail
WHERE ProductID = @newproductid
This query avoids parameter sniffing by declaring and setting a new variable. Is this really a SQL Server 2005 work-around for the OPTIMIZE-FOR-UNKNOWN feature? Or am I missing something? (Authoritative links, answers or test results are appreciated).
More Info:
A quick test on SQL Server 2008 tells me that the number of estimated rows in this query is in fact the same as if OPTIMIZE FOR UNKNOWN was specified. Is this the same behavior on SQL Server 2005? I think I remember hearing once that without more info, the SQL Server Optimizing Engine has to guess at the selectivity of the parameter (usually at 10% for inequality predicates). I’m still looking for definitive info on SQL 2005 behavior though. I’m not quite sure that info exists though…
More Info 2:
To be clear, this question is asking for a comparison of the UNKNOWN query hint and the parameter-masking technique I describe.
It’s a technical question, not a problem solving question. I considered a lot of other options and settled on this. So the only goal of this question was to help me gain some confidence that the two methods are equivalent.
Okay, so I’ve done some experimenting. I’ll write up the results here, but first I want to say that based on what I’ve seeen and know, I’m confident that using temporary parameters in 2005 and 2008 is exactly equivalent to using 2008’s OPTIMIZE FOR UNKNOWN. At least in the context of stored procedures.
So this is what I’ve found.
In the procedure above, I’m using the AdventureWorks database. (But I use similar methods and get similar results for any other database) I ran:
And I see statistics with 200 steps in its histogram. Looking at its histogram I see that there are 66 distinct range rows (i.e. 66 distinct values that weren’t included in stats as equality values). Add the 200 equality rows (from each step), and I get an estimate of 266 distinct values for ProductId in Sales.SalesOrderDetail.
With 121317 rows in the table, I can estimate that each ProductId has 456 rows on average. And when I look at the query plan for my test procedure (in xml format) I see something like:
So I know where the EstimateRows value is coming from (accurate to three decimals) and Notice that the ParameterCompiledValue attribute is missing from query plan. This is exactly what a plan looks like when using 2008’s OPTIMIZE FOR UNKNOWN