I have two databases on a local machine, connected to localhost . They both

Question

0

Asked: June 13, 20262026-06-13T15:23:48+00:00 2026-06-13T15:23:48+00:00

I have two databases on a local machine, connected to localhost . They both

0

I have two databases on a local machine, connected to localhost. They both have roughly two million rows a piece. I was doing the following very simple join and it took over a minute to complete.

select distinct x.patid
    from [i 3 sci study].dbo.clm_extract as x
    left join [i 3 study].dbo.claims as y on y.patid=x.patid
    where y.patid is null

When I looked at the execution plan I saw that the join showplan operator had this to say
enter image description here

Why is the actual number of rows so exorbitantly high compared to the actual number of rows in both tables?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T15:23:49+00:00

The LEFT JOIN will match each row on the left with each row on the right, and then filter. Assuming patid is not unique in either table, the number of possible match combinations could get very high.

Try the following:

SET NOCOUNT ON;
GO
CREATE TABLE #t1 (Id INT NOT NULL);
CREATE TABLE #t2 (Id INT NOT NULL);
GO

INSERT #t1 (Id)
VALUES (1);
GO 100

INSERT #t2 (Id)
SELECT Id FROM #t1;
GO

Now look at the execution plan for the left join query form:

SELECT *
FROM #t1
LEFT OUTER JOIN #t2 ON #t1.Id = #t2.Id
WHERE #t2.Id IS NULL;

Looking at the execution plan, the hash join shows 10,000 actual rows (100 from #t1 x 100 from #t2). This shows the advantage of checking for existence (or a lack thereof) using any of the following T-SQL syntaxes:

SELECT #t1.Id
FROM #t1
WHERE NOT EXISTS (SELECT * FROM #t2 WHERE Id = #t1.Id);

-- #t2.Id must not contain any NULLs for this to be correct
SELECT #t1.Id
FROM #t1
WHERE Id NOT IN (SELECT #t2.Id FROM #t2);

-- Returns DISTINCT #t1 values
SELECT Id
FROM #t1
EXCEPT
SELECT Id 
FROM #t2;

Checking for a lack of existence enables the engine to short circuit. This is due to the anti semi join. As soon as the first match is found, it moves on to the next record. For more details, see this blog post.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have two databases on a local machine, connected to localhost . They both

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply