Given: Table y id int clustered index name nvarchar(25) Table anothertable id int clustered

Question

0

Editorial Team

Asked: May 26, 20262026-05-26T04:20:00+00:00 2026-05-26T04:20:00+00:00

Given: Table y id int clustered index name nvarchar(25) Table anothertable id int clustered

0

Given:

Table y

id int clustered index
name nvarchar(25)

Table anothertable

id int clustered Index
name nvarchar(25)

Table someFunction

does some math then returns a valid ID

Compare:

SELECT y.name
  FROM y
 WHERE dbo.SomeFunction(y.id) IN (SELECT anotherTable.id 
                                    FROM AnotherTable)

vs:

SELECT y.name 
  FROM y
  JOIN AnotherTable ON dbo.SomeFunction(y.id) ON anotherTable.id

Question:

While timing these two queries out I found that at large data sets the first query using IN is much faster then the second query using an INNER JOIN. I do not understand why can someone help explain please.

Execution Plan

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T04:20:01+00:00

Generally speaking IN is different from JOIN in that a JOIN can return additional rows where a row has more than one match in the JOIN-ed table.

From your estimated execution plan though it can be seen that in this case the 2 queries are semantically the same

SELECT
        A.Col1
        ,dbo.Foo(A.Col1)
        ,MAX(A.Col2)
        FROM A
        WHERE dbo.Foo(A.Col1)  IN (SELECT Col1 FROM B)
    GROUP BY
        A.Col1,
        dbo.Foo(A.Col1)

versus

SELECT
        A.Col1
        ,dbo.Foo(A.Col1)
        ,MAX(A.Col2)
        FROM A
        JOIN B ON dbo.Foo(A.Col1) = B.Col1
    GROUP BY
        A.Col1,
        dbo.Foo(A.Col1)

Even if duplicates are introduced by the JOIN then they will be removed by the GROUP BY as it only references columns from the left hand table. Additionally these duplicate rows will not alter the result as MAX(A.Col2) will not change. This would not be the case for all aggregates however. If you were to use SUM(A.Col2) (or AVG or COUNT) then the presence of the duplicates would change the result.

It seems that SQL Server doesn’t have any logic to differentiate between aggregates such as MAX and those such as SUM and so quite possibly it is expanding out all the duplicates then aggregating them later and simply doing a lot more work.

The estimated number of rows being aggregated is 2893.54 for IN vs 28271800 for JOIN but these estimates won’t necessarily be very reliable as the join predicate is unsargable.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Given: Table y id int clustered index name nvarchar(25) Table anothertable id int clustered

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply