I have two databases on a local machine, connected to localhost. They both have roughly two million rows a piece. I was doing the following very simple join and it took over a minute to complete.
select distinct x.patid
from [i 3 sci study].dbo.clm_extract as x
left join [i 3 study].dbo.claims as y on y.patid=x.patid
where y.patid is null
When I looked at the execution plan I saw that the join showplan operator had this to say

Why is the actual number of rows so exorbitantly high compared to the actual number of rows in both tables?
The
LEFT JOINwill match each row on the left with each row on the right, and then filter. Assumingpatidis not unique in either table, the number of possible match combinations could get very high.Try the following:
Now look at the execution plan for the left join query form:
Looking at the execution plan, the hash join shows 10,000 actual rows (100 from #t1 x 100 from #t2). This shows the advantage of checking for existence (or a lack thereof) using any of the following T-SQL syntaxes:
Checking for a lack of existence enables the engine to short circuit. This is due to the anti semi join. As soon as the first match is found, it moves on to the next record. For more details, see this blog post.