In looking at System.Linq.Enumerable through Reflector i noticed that default iterator used for Select and Where extension methods – WhereSelectArrayIterator – does not implement ICollection interface. If i read code properly this causes some other extension methods, such as Count() and ToList() perform slower:
public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
// code above snipped
if (source is List<TSource>)
{
return new WhereSelectListIterator<TSource, TResult>((List<TSource>) source, null, selector);
}
// code below snipped
}
private class WhereSelectListIterator<TSource, TResult> : Enumerable.Iterator<TResult>
{
// Fields
private List<TSource> source; // class has access to List source so can implement ICollection
// code below snipped
}
public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable
{
public List(IEnumerable<T> collection)
{
ICollection<T> is2 = collection as ICollection<T>;
if (is2 != null)
{
int count = is2.Count;
this._items = new T[count];
is2.CopyTo(this._items, 0); // FAST
this._size = count;
}
else
{
this._size = 0;
this._items = new T[4];
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Add(enumerator.Current); // SLOW, CAUSES ARRAY EXPANSION
}
}
}
}
}
I’ve tested this with results confirming my suspicion:
ICollection: 2388.5222 ms
IEnumerable: 3308.3382 ms
Here’s the test code:
// prepare source
var n = 10000;
var source = new List<int>(n);
for (int i = 0; i < n; i++) source.Add(i);
// Test List creation using ICollection
var startTime = DateTime.Now;
for (int i = 0; i < n; i++)
{
foreach(int l in source.Select(k => k)); // itterate to make comparison fair
new List<int>(source);
}
var finishTime = DateTime.Now;
Response.Write("ICollection: " + (finishTime - startTime).TotalMilliseconds + " ms <br />");
// Test List creation using IEnumerable
startTime = DateTime.Now;
for (int i = 0; i < n; i++) new List<int>(source.Select(k => k));
finishTime = DateTime.Now;
Response.Write("IEnumerable: " + (finishTime - startTime).TotalMilliseconds + " ms");
Am i missing something or will this be fixed in future versions of framework?
Thank you for your thoughts.
LINQ to Objects uses some tricks to optimize certain operations. For example, if you chain two
.Wherestatements together, the predicates will be combined into a singleWhereArrayIterator, so the previous ones can be garbage collected. Likewise, aWherefollowed by aSelectwill create aWhereSelectArrayIterator, passing the combined predicates as an argument so that the originalWhereArrayiteratorcan be garbage collected. So theWhereSelectArrayIteratoris responsible for tracking not only theselector, but also the combinedpredicatethat it may or may not be based on.The
sourcefield only keeps track of the initial list that was given. Because of the predicate, the iteration result will not always have the same number of items assourcedoes. Since LINQ is intended to be lazily-evaluated, it shouldn’t evaluate thesourceagainst thepredicateahead of time just so that it can potentially save time if someone ends up calling.Count(). That would cause just as much of a performance hit as calling.ToList()on it manually, and if the user ran it through multipleWhereandSelectclauses, you’d end up constructing multiple lists unnecessarily.Could LINQ to Objects be refactored to create a
SelectArrayIteratorthat it uses whenSelectgets called directly on an array? Sure. Would it enhance performance? A little bit. At what cost? Less code reuse means additional code to maintain and test moving forward.And thus we get to the crux of the vast majority of “Why doesn’t language/platform X have feature Y” questions: every feature and optimization has some cost associated with it, and even Microsoft doesn’t have unlimited resources. Just like every other company out there, they make judgment calls to determine how often code will be run that performs a
Selecton an array and then calls.ToList()on it, and whether making that run a little faster is worth writing and maintaining another class in the LINQ package.