Trying to refactor some code that has gotten really slow recently and I came across a code block that is taking 5+ seconds to execute.
The code consists of 2 statements:
IEnumerable<int> StudentIds = _entities.Filters
.Where(x => x.TeacherId == Profile.TeacherId.Value && x.StudentId != null)
.Select(x => x.StudentId)
.Distinct<int>();
and
_entities.StudentClassrooms
.Include("ClassroomTerm.Classroom.School.District")
.Include("ClassroomTerm.Teacher.Profile")
.Include("Student")
.Where(x => StudentIds.Contains(x.StudentId)
&& x.ClassroomTerm.IsActive
&& x.ClassroomTerm.Classroom.IsActive
&& x.ClassroomTerm.Classroom.School.IsActive
&& x.ClassroomTerm.Classroom.School.District.IsActive).AsQueryable<StudentClassroom>();
So it’s a bit messy but first I get a Distinct list of Id’s from one Table (Filters), then I query another Table using it.
These are relatively small tables, but it’s still 5+ seconds of query time.
I put this in LINQPad and it showed that it was doing the bottom query first then running 1000 “distinct” queries afterwards.
On a whim I changed the “StudentIds” code by just adding .ToArray() at the end. This improved the speed 1000x … it now takes like 100ms to complete the same query.
What’s the deal? What am I doing wrong?
This is one of the pitfalls of deferred execution in Linq: In your first approach
StudentIdsis really anIQueryable, not an in-memory collection. That means using it in the second query will run the query again on the database – each and every time.Forcing execution of the first query by using
ToArray()makesStudentIdsan in-memory collection and theContainspart in your second query will run over this collection that contains a fixed sequence of items – This gets mapped to something equivalent to a SQLwhere StudentId in (1,2,3,4)query.This query will of course, be much much faster since you determined this sequence once up-front, and not every time the
Whereclause is executed. Your second query without usingToArray()(I would think) would be mapped to a SQL query with anwhere exists (...)sub-query that gets evaluated for each row.