I am using LINQ to Objects and wonder if it is possible to improve the performance of my queries by making use of an index that I have. This is best explained with an example. Imagine a simple type…
public class Person
{
public int Age;
public string FirstName;
public string LastName;
}
And a simple query I would make against it…
List<Person> people = new List<Person>();
// 'people' populated with 50,000 instances...
var x = from t in people
where t.Age > 18 && t.Age < 21
select t;
If I understand LINQ to Objects correctly then the implementation of the Where extension method will enumerate all 50,000 instances in the people collection in order to find the 100 that actually match. As it happens I already have an index of the people collection that is sorted by Age. Like this…
SortedList<int, Person> ageSorted = new SortedList<int, Person>();
Clearly it would make sense if I could get the Where to use the SortedList so that it no longer has to enumerate all 50,000 instances, instead finding the range of 100 matching entries and so saving time.
Is it possible to extend LINQ to Objects to enable my situation? Is it already possible but I am missing the technique?
There’s already a project which I believe does exactly that – i4o. I can’t say I’ve used it myself, but it sounds like the kind of thing you want… you may need to juggle your existing code a bit, but it’s certainly worth looking at.
If that doesn’t help, you could at least write your own extension methods on
SortedList<TKey, TValue>. You probably wouldn’t be able to easily use your actualwhereclause, but you could use your own methods taking a minimum and a maximum value. You might also want to make them apply toIList<T>where you assert that you’ve already sorted the values appropriately (according to some comparer).For example (completely untested):
(If you only have
List<T>instead ofIList<T>, you could useList<T>.BinarySearch, although you’d need to build a customIComparer<T>.)Also, have a look at
SortedSet<T>in .NET 4.