I have this query… SELECT Distinct([TargetAttributeID]) FROM (SELECT distinct att1.intAttributeID as [TargetAttributeID] FROM AST_tblAttributes

Question

0

Asked: June 5, 20262026-06-05T15:27:59+00:00 2026-06-05T15:27:59+00:00

I have this query… SELECT Distinct([TargetAttributeID]) FROM (SELECT distinct att1.intAttributeID as [TargetAttributeID] FROM AST_tblAttributes

0

I have this query…

SELECT Distinct([TargetAttributeID]) FROM
    (SELECT distinct att1.intAttributeID as [TargetAttributeID]
        FROM AST_tblAttributes att1
        INNER JOIN
        AST_lnkProfileDemandAttributes pda
        ON pda.intAttributeID=att1.intAttributeID AND pda.intProfileID = @intProfileID

    union all

    SELECT distinct ca2.intAttributeID as [TargetAttributeID] FROM
        AST_lnkCapturePolicyAttributes ca2
        INNER JOIN
        AST_lnkEmployeeCapture ec2 ON ec2.intAdminCaptureID = ca2.intAdminCaptureID AND ec2.intTeamID = 57
        WHERE ec2.dteCreatedDate >= @cutoffdate) x

Execution Plan for the above query

The two inner distincts are looking at 32 and 10,000 rows respectively. This query returns 5 rows and executes in under 1 second.

If I then use the result of this query as the subject of an IN like so…

SELECT attx.intAttributeID,attx.txtAttributeName,attx.txtAttributeLabel,attx.txtType,attx.txtEntity FROM
    AST_tblAttributes attx WHERE attx.intAttributeID 
    IN
    (SELECT Distinct([TargetAttributeID]) FROM
    (SELECT Distinct att1.intAttributeID as [TargetAttributeID]
        FROM AST_tblAttributes att1
        INNER JOIN
        AST_lnkProfileDemandAttributes pda
        ON pda.intAttributeID=att1.intAttributeID AND pda.intProfileID = @intProfileID
    union all
    SELECT  Distinct ca2.intAttributeID as [TargetAttributeID] FROM
        AST_lnkCapturePolicyAttributes ca2
        INNER JOIN
        AST_lnkEmployeeCapture ec2 ON ec2.intAdminCaptureID = ca2.intAdminCaptureID AND ec2.intTeamID = 57
        WHERE ec2.dteCreatedDate >= @cutoffdate) x)

Execution Plan for the above query

Then it takes over 3 minutes! If I just take the result of the query and perform the IN “manually” then again it comes back extremely quickly.

However if I remove the two inner DISTINCTS….

SELECT attx.intAttributeID,attx.txtAttributeName,attx.txtAttributeLabel,attx.txtType,attx.txtEntity FROM
    AST_tblAttributes attx WHERE attx.intAttributeID 
    IN
    (SELECT Distinct([TargetAttributeID]) FROM
    (SELECT att1.intAttributeID as [TargetAttributeID]
        FROM AST_tblAttributes att1
        INNER JOIN
        AST_lnkProfileDemandAttributes pda
        ON pda.intAttributeID=att1.intAttributeID AND pda.intProfileID = @intProfileID
    union all
    SELECT ca2.intAttributeID as [TargetAttributeID] FROM
        AST_lnkCapturePolicyAttributes ca2
        INNER JOIN
        AST_lnkEmployeeCapture ec2 ON ec2.intAdminCaptureID = ca2.intAdminCaptureID AND ec2.intTeamID = 57
        WHERE ec2.dteCreatedDate >= @cutoffdate) x)

Execution Plan for the above query

..then it comes back in under a second.

What is SQL Server thinking? Can it not figure out that it can perform the two sub-queries and use the result as the subject of the IN. It seems as slow as a correlated sub-query, but it isn’t correlated!!!

In Show Estimate Execution plan there are three Clustered Index Scans each with a cost of 100%! (Execution Plan is here)

Can anyone tell me why the inner DISTINCTS make this query so much slower (but only when used as the subject of an IN…) ?

UPDATE

Sorry it’s taken me a while to get these execution plans up…

Query 1

Query 2 (The slow one)

Query 3 – No Inner Distincts

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T15:28:00+00:00

Honestly I think it comes down to the fact that, in terms of relational operators, you have a gratuitously baroque query there, and SQL Server stops searching for alternate execution plans within the time it allows itself to find one.

After the parse and bind phase of plan compilation, SQL Server will apply logical transforms to the resulting tree, estimate the cost of each, and choose the one with the lowest cost. It doesn’t exhaust all possible transformations, just as many as it can compute within a given window. So presumably, it has burned through that window before it arrives at a good plan, and it’s the addition of the outer semi-self-join on AST_tblAttributes that pushed it over the edge.

How is it gratuitously baroque? Well, first off, there’s this (simplified for noise reduction):

select distinct intAttributeID from (
   select distinct intAttributeID from AST_tblAttributes ....
   union all
   select distinct intAttributeID from AST_tblAttributes ....
   )

Concatenating two sets, and projecting the unique elements? Turns out there’s operator for that, it’s called UNION. So given enough time during plan compilation and enough logical transformations, SQL Server will realize what you really mean is:

select intAttributeID from AST_tblAttributes ....
union
select intAttributeID from AST_tblAttributes ....

But wait, you put this in a correlated subquery. Well, a correlated subquery is a semi-join, and the right relation does not require logical dedupping in a semi-join. So SQL Server may logically rewrite the query as this:

select * from AST_tblAttributes
where intAttributeID in (
  select intAttributeID from AST_tblAttributes ....
  union all
  select intAttributeID from AST_tblAttributes ....
  )

And then go about physical plan selection. But to get there, it has to see though the cruft first, and that may fall outside the optimization window.

EDIT:

Really, the way to explore this for yourself, and corroborate the speculation above, is to put both versions of the query in the same window and compare estimated execution plans side-by-side (Ctrl-L in SSMS). Leave one as is, edit the other, and see what changes.

You will see that some alternate forms are recognized as logically equivalent and generate to the same good plan, and others generate less optimal plans, as you bork the optimizer.**

Then, you can use SET STATISTICS IO ON and SET STATISTICS TIME ON to observe the actual amount of work SQL Server performs to execute the queries:

SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT ....
SELECT ....

SET STATISTICS IO OFF
SET STATISTICS TIME OFF

The output will appear in the messages pane.

** Or not–if they all generate the same plan, but actual execution time still varies like you say, something else may be going on–it’s not unheard of. Try comparing actual execution plans and go from there.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have this query… SELECT Distinct([TargetAttributeID]) FROM (SELECT distinct att1.intAttributeID as [TargetAttributeID] FROM AST_tblAttributes

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply