We’ve got a slight performance issue in a section of code that’s using LINQ and it’s raised a question about how LINQ works in terms of lookups
My question is this (please note that I have changed all the code so this is an indicative example of the code, not the real scenario):
Given
public class Person {
int ID;
string Name;
DateTime Birthday;
int OrganisationID;
}
If I had a list of say 100k Person objects and then a list of dates, say 1000, and I ran this code:
var personBirthdays = from Person p in personList
where p.OrganisationID = 123
select p.Birthday;
foreach (DateTime d in dateList)
{
if (personBirthdays.Contains(d))
Console.WriteLine(string.Format("Date: {0} has a Birthday", d.ToShortDateString()));
}
The resultant code would be an iteration of:
100k (the loop that needs to be done to find the users with the organisationID 123)
multiplied by
1000 (the amount of dates in the list)
multiplied by
x (the amount of users who have the organisationID 123 to be checked against for the date)
This is a lot of iterations!
If I changed the code the personBirthdays to this:
List<DateTime> personBirthdays =
(from Person p in personList
where p.OrganisationID = 123
select p.Birthday).ToList();
This should remove the 100k as a multiple by, and just do it once?
So you would have 100k + (1000 * x) instead of (100k * 1000 * x).
The question is that this seems too easy, and I’m sure the LINQ is doing something clever somewhere that should mean that this doesn’t happen.
If no one answers, I’ll run some tests and report back.
Clarity update:
We’re not considering Database lookups, the personList object is an In Memory list object. This all LINQ-to-Objects.
What it means is that instead of iterating
personList100k times, performing thewhereandselectoperations for each of those iterations that you’ll be iterating the resultingList100k times, and that thewhereandselectoperations will only have been performed on the underlying data source once.Nope, your first query is simply something that you shouldn’t be doing using LINQ, you should be taking the results of the query and placing them into a data structure if you plan to iterate over them many times (which is what you changed).
You can improve this query even more by using the appropriate data structure. Searching on a
Listis rather inefficient, as it needs to do a linear search. It would be preferable to use aHashSetto store the results of the query. AHashSethas O(1) search speed in the average case, as opposed to O(n) search time of aList.