I have a complex database model set up in Django, and I have to do a number of calculations based on filter data. I have a Test object, a TestAttempt object, and a UserProfile object (with a foreign key back to test and a foreign key back to a userprofile). There is a method that I run on a TestAttempt that calculates the test score (based on a number of user-supplied choices compared to the correct answers associated with each test). And then another method that I run on a Test that calculates the average test score based on each of its associated TestAttempt‘s But sometimes I only want the average based on a supplied subset of the associated TestAttempt‘s that are linked with a particular set of UserProfiles. So instead of calculating the average test score for a particular test this way:
[x.score() for x in self.test_attempts.all()]
and then averaging these values.
I do a query like this:
[x.score() for x in self.test_attempts.filter(profile__id__in=user_id_list).all()]
where user_id_list is a particular subset of UserProfile id’s for which I want to find the average test score in the form of a list. My question is this: if user_id_list is indeed the entire set of UserProfile‘s (so the filter will return the same as self.test_attempts.all()) and most of the time this will be the case, does it pay to check for this case, and if so not execute the filter at all? or is the __in lookup efficient enough that even if user_id_list contains all users it’ll be more efficient to run the filter. Also, do I need to worry about making the resulting test_attempts distinct()? or they can’t possible turn up duplicates with the structure of my queryset?
EDIT: For anyone who’s interested in looking at the raw SQL query, it looks like this without the filter:
SELECT "mc_grades_testattempt"."id", "mc_grades_testattempt"."date",
"mc_grades_testattempt"."test_id", "mc_grades_testattempt"."student_id" FROM
"mc_grades_testattempt" WHERE "mc_grades_testattempt"."test_id" = 1
and this with the filter:
SELECT "mc_grades_testattempt"."id", "mc_grades_testattempt"."date",
"mc_grades_testattempt"."test_id", "mc_grades_testattempt"."student_id" FROM
"mc_grades_testattempt" INNER JOIN "mc_grades_userprofile" ON
("mc_grades_testattempt"."student_id" = "mc_grades_userprofile"."id") WHERE
("mc_grades_testattempt"."test_id" = 1 AND "mc_grades_userprofile"."user_id" IN (1, 2, 3))
note that the array (1,2,3) is just an example
Short answer is – benchmark. Test it in different situations and measure the load. It will be the best answer.
There can’t be duplicates here.
Is it really a problem to check for two situalions? Here’s the hypotetic code:
I don’t know what does
scoremethod do, but can’t you compute the average at DB level? It will give you much more noticable perfomance boost.And don’t forget about caching.