I need to pull multiple columns from a subquery which also requires a WHERE filter referencing columns of the FROM table. I have a couple of questions about this:
- Is there another solution to this problem besides mine below?
- Is another solution even necessary or is this solution efficient enough?
Example:
In the following example I’m writing a view to present test scores, particularly to discover failures that may need to be addressed or retaken.
I cannot simply use JOIN because I need to filter my actual subquery first (notice I’m getting TOP 1 for the “examinee”, sorted either by score or date descending)
My goal is to avoid writing (and executing) essentially the same subquery repeatedly.
SELECT ExamineeID, LastName, FirstName, Email,
(SELECT COUNT(examineeTestID)
FROM exam.ExamineeTest tests
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2) Attempts,
(SELECT TOP 1 ExamineeTestID
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestExamineeTestID,
(SELECT TOP 1 Score
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestScore,
(SELECT TOP 1 DateDue
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestDateDue,
(SELECT TOP 1 TimeCommitted
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestTimeCommitted,
(SELECT TOP 1 ExamineeTestID
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentExamineeTestID,
(SELECT TOP 1 Score
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentScore,
(SELECT TOP 1 DateDue
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentDateDue,
(SELECT TOP 1 TimeCommitted
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentTimeCommitted
FROM exam.Examinee E
To answer your second question first, yes, a better way is in order, because the query you’re using is hard to understand, hard to maintain, and even if the performance is acceptable now, it’s a shame to query the same table multiple times when you don’t need to plus the performance may not always be acceptable if your application ever grows to an appreciable size.
To answer your first question, I have a few methods for you. These assume SQL 2005 or up unless where noted.
Note that you don’t need BestExamineeID and CurrentExamineeID because they will always be the same as ExamineeID unless no tests were taken and they’re NULL, which you can tell from the other columns being NULL.
You can think of OUTER/CROSS APPLY as an operator that lets you move correlated subqueries from the WHERE clause into the JOIN clause. They can have an outer reference to a previously-named table, and can return more than one column. This enables you to do the job only once per logical query rather than once for each column.
You should experiment to see if my
Count(*) OVER ()is better than having an additionalOUTER APPLYthat just gets the count. If you’re not restricting the Examinee from theexam.Examineetable, it may be better to just do a normal aggregate in a derived table.Here’s another method that (sort of) goes and gets all the data in one swoop. It conceivably could perform better than other queries, except my experience is that windowing functions can get very and surprisingly expensive in some situations, so testing is in order.
Finally, here’s a SQL 2000 method:
This query will return unexpected extra rows if the combination of (ExamineeID, Score) or (ExamineeID, DueDate) can return multiple rows. That’s probably not unlikely with Score. If neither is unique, then you need to use (or add) some additional column that can grant uniqueness so it can used to select one row. If only Score can be duplicated then an additional pre-query that gets the max Score first, then dovetailing in with the max DueDate would combine to pull the most recent score that was a tie for the highest at the same time as getting the most recent data. Let me know if you need more SQL 2000 help.
Note: The biggest thing that is going to control whether CROSS APPLY or a ROW_NUMBER() solution is better is whether you have an index on the columns that are being looked up and whether the data is dense or sparse.
The group by solution that I gave for SQL 2000 will probably perform the worst, but not guaranteed. Like I said, testing is in order.
If any of my queries do give performance problems let me know and I’ll see what I can do to help. I’m sure I probably have typos as I didn’t work up any DDL to recreate your tables, but I did my best without trying it.
If performance really does become crucial, I would create ExamineeTestBest and ExamineeTestCurrent tables that get pushed to by a trigger on the ExamineeTest table that would always keep them updated. However, this is denormalization and probably not necessary or a good idea unless you’ve scaled so awfully big that retrieving results becomes unacceptably long.