Recently, I’ve been asked to write a query to select properties of entities from a group that contains the maximum number of such entities. So, I did it on Northwind (MSFT distributed sample) database in a couple of ways.
ONE:
SELECT cat.CategoryName, prod.ProductName
FROM Categories cat
JOIN Products prod ON cat.CategoryID = prod.CategoryID
JOIN (SELECT TOP 1 p.CategoryID, COUNT(p.ProductId) as products
FROM Categories c
JOIN Products p on c.CategoryID = p.CategoryID
GROUP BY p.CategoryID
ORDER BY products desc) c ON c.CategoryID = cat.CategoryID
TWO:
SELECT cat.CategoryName, prod.ProductName
FROM Categories cat
JOIN Products prod ON cat.CategoryID = prod.CategoryID
JOIN (SELECT CategoryID, COUNT(ProductID) m_count
FROM Products
GROUP BY CategoryID
HAVING COUNT(ProductID) = (SELECT MAX(sub.cnt)
FROM (SELECT CategoryId, COUNT(ProductID) cnt
FROM Products
GROUP BY CategoryId) sub)) m ON m.CategoryID = cat.CategoryID
The question is: which is faster any why? In execution plans nothing stands out, in particular. The elapsed time varies slightly, but is the same, approximately. The database is, of course, tiny.
A tiny database makes it difficult to determine which is better, but SQL Server Management Studio has functionality to compare the efficiency of statements to one another.
The query cost is averaged by the number of queries run. So if comparing the two queries provided as examples, if both have a cost of 50% then they are equivalent (because 100 / 2 = 50, etc). When there’s a difference, you can mouseover the SELECT to review the subtree cost, besides looking at the graphical layout of the Execution path.
Why is one faster than the other?
That depends on the database — the data types being joined (are they as narrow as they could be? "narrow" means taking less bytes to store), indexes, and what is being performed in the query. Using different syntax can make all the difference.