Given a table of responses with columns:
Username, LessonNumber, QuestionNumber, Response, Score, Timestamp
How would I run a query that returns which users got a score of 90 or better on their first attempt at every question in their last 5 lessons? “last 5 lessons” is a limiting condition, rather than a requirement, so if they completely only 1 lesson, but got all of their first attempts for each question right, then they should be included in the results. We just don’t want to look back farther than 5 lessons.
About the data: Users may be on different lessons. Some users may have not yet completed five lessons (may only be on lesson 3 for example). Each lesson has a different number of questions. Users have different lesson paths, so they may skip some lesson numbers or even complete lessons out of sequence.
Since this seems to be a problem of transforming temporally non-uniform/discontinuous values into uniform/contiguous values per-user, I think I can solve the bulk of the problem with a couple ranking function calls. The conditional specification of scoring above 90 for “first attempt at every question in their last 5 lessons” is also tricky, because the number of questions completed is variable per-user.
So far…
As a starting point or hint at what may need to happen, I’ve transformed Timestamp into an “AttemptNumber” for each question, by using “row_number() over (partition by Username,LessonNumber,QuestionNumber order by Timestamp) as AttemptNumber”.
I’m also trying to transform LessonNumber from an absolute value into a contiguous ranked value for individual users. I could use “dense_rank() over (partition by Username order by LessonNumber desc) as LessonRank”, but that assumes the order lessons are completed corresponds with the order of LessonNumber, which is unfortunately not always the case. However, let’s assume that this is the case, since I do have a way of producing such a number through a couple of joins, so I can use the dense_rank transform described to select the “last 5 completed lessons” (i.e. LessonRank <= 5).
For the >90 condition, I think I can transform the score into an integer so that it’s “1” if >= 90, and “0” if < 90. I can then introduce a clause like “group by Username having SUM(Score)=COUNT(Score).”, which will select only those users with all scores equal to 1.
Any solutions or suggestions would be appreciated.
You kind of gave away the solution:
Concerning the
LessonRank, I used exactly what you desribed since it is not clear how to order the lessons otherwise: The timestamp of the first attempt of the first question of a lesson? Or the timestamp of the first attempt of any question of a lesson? Or simply the first(or the most recent?) timestamp of any result of any question of a lesson?The innermost
Selectadds all theAttemptNumberandLessonRankas provided by you.The next
Selectretains only the results which would disqualify a user to be in the final list – all first attempts with an insufficient score in the last 5 lessons. We end up with a list of users we do not want to display in the final result.Therefore, in the outermost
Select, we can select all the users which are not in the exclusion list. Basically all the other users which have answered any question.EDIT: As so often, second try should be better…
One more EDIT:
Here’s a version including your remarks in the comments.
I used a
Havingclause with aMINexpression on the calculatedQuestionScoredWellvalue.When comparing the execution plans for both queries, this query is actually faster. Not sure though whether this is partially due to the low number of data rows in my table.