I read in a Microsoft T-SQL Performance Tuning whitepaper that correlated sub-queries can be costly in terms of performance on a large table:
…Compare this to the first
solution that would scan the whole table and execute a correlated subquery for every
row. The difference in performance is negligible on a small table. But on a large table it
may amount to hours of processing time…
Is there a general way to convert a query with several aggregations based upon different criteria as correlated sub-queries into a single query that uses JOINs instead of correlated sub-queries?
Consider an example:
Prepare the schema:
CREATE TABLE Student (
ID INT NOT NULL PRIMARY KEY IDENTITY(1,1),
Name NVARCHAR(255) NOT NULL
);
CREATE TABLE Grade (
ID INT NOT NULL PRIMARY KEY IDENTITY(1,1),
StudentID INT NOT NULL FOREIGN KEY REFERENCES Student(ID),
Score INT NOT NULL,
CONSTRAINT CK_Grade_Score CHECK (Score >= 0 AND Score <= 100)
);
INSERT INTO Student (Name) VALUES ('Steven');
INSERT INTO Student (Name) VALUES ('Timmy');
INSERT INTO Student (Name) VALUES ('Maria');
INSERT INTO Grade (StudentID, Score) VALUES (1, 90);
INSERT INTO Grade (StudentID, Score) VALUES (1, 81);
INSERT INTO Grade (StudentID, Score) VALUES (1, 82);
INSERT INTO Grade (StudentID, Score) VALUES (1, 82);
INSERT INTO Grade (StudentID, Score) VALUES (2, 99);
INSERT INTO Grade (StudentID, Score) VALUES (2, 63);
INSERT INTO Grade (StudentID, Score) VALUES (2, 97);
INSERT INTO Grade (StudentID, Score) VALUES (2, 90);
INSERT INTO Grade (StudentID, Score) VALUES (3, 66);
INSERT INTO Grade (StudentID, Score) VALUES (3, 61);
INSERT INTO Grade (StudentID, Score) VALUES (3, 60);
The query in question:
SELECT Name,
(SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score < 65) AS 'F',
(SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 65 AND Score < 70) AS 'D',
(SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 70 AND Score < 80) AS 'C',
(SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 80 AND Score < 90) AS 'B',
(SELECT AVG(Score) FROM Grade WHERE StudentID = Student.ID AND Score >= 90 AND Score <= 100) AS 'A'
FROM Student
Produces the following result:
Name F D C B A
-----------------------------------------
Steven NULL NULL NULL 81 90
Timmy 63 NULL NULL NULL 95
Maria 60 66 NULL NULL NULL
I am aware of the technique that you can use with COUNT() where you perform a single SELECT with a JOIN and then use a CASE statement to optionally add 1 to a counter when the primary keys line up between your join AND your condition is true. I am looking for a similar sort of technique that can be applied for different types of aggregations (as opposed to just COUNT).
Maybe I’m missing something, but the solution using a CASE does work for aggregates as well: