There are table A and table B. I want to join these tables on two columns but only for selected rows of table A.
Query scenarios:
SELECT B.*
FROM B
INNER JOIN (SELECT * FROM A WHERE A.COLUMN1 BETWEEN somevalue1 AND somevalue2) C
ON B.COLUMN2 = C.COLUMN2
AND B.COLUMN3 = C.COLUMN3
OR
SELECT B.*
FROM B
INNER JOIN A
ON B.COLUMN2 = A.COLUMN2
AND B.COLUMN3 = A.COLUMN3
WHERE A.COLUMN1 BETWEEN somevalue1 AND somevalue2
Both tables A and B have millions of records. With WHERE condition table A will return me only 1000 results, so the actual join to be performed is to find matching details from B for only 1000 rows of A.
Query:
Which one should be faster? (I do not have access to view the query execution plan)
Thanks!
It’s hard to predict performance here without actually measuring.
My instincts say the latter option should be faster because an optimizer may want to fully materialize the inner query before the join, which in addition to being slow all by itself could break any indexing that might help the join along. The optimizer for the latter option, on the other hand, should still be smart enough to pre-filter table A before the join, with no risk of breaking indexes and the ability only materialize results that match the join. Notice all the weasel words in there, though; my instincts could be way off in this case. The real lesson to take away from this is to measure your query using real data under conditions as close to actual as possible.
More importantly, I prefer the latter because (imo) it’s just more readable and maintainable.