From my mentor: Prefer native methods (implemented directly on the collection) over extension methods of IEnumerable, because:
The LINQ-to-Objects extension methods
are implemented on IEnumerable,
meaning that in the worst-case
scenario (when the item you search for
does not exist in the collection) you
will have to enumerate thru all
elements. If you have a Contains or
Exists method implemented directly on
the collection, it could make use of
internal knowledge and maybe just do a
hash table look up or some other quick
operation.
I was a deeply confused, because I think Microsoft should have implemented hash table for IEnumerable Contains/Exists already. A quick benchmark with List and IEnumerable show no differences:
static void Main(string[] args)
{
Console.Write("input the number of elements: ");
int count = Convert.ToInt32(Console.ReadLine());
Console.Write("input the number of loops: ");
int loop = Convert.ToInt32(Console.ReadLine());
Random r = new Random();
Stopwatch sw = new Stopwatch();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt(count);
sw.Start();
for (int j = 0; j < count; j++)
{
DoContains(list, r.Next());
}
sw.Stop();
}
Console.WriteLine("List<T> native method: Iterated {0} times on {1} elements, elapsed :{2}",loop,count,sw.Elapsed);
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt(count);
sw.Start();
for (int j = 0; j < count; j++)
{
DoContainsEnumerable(list, r.Next());
}
sw.Stop();
}
Console.WriteLine("IEnumerable<T> extension method: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt2(count);
sw.Start();
for (int j = 0; j < count; j++)
{
//make sure that the element is not in the list
DoContains(list, r.Next(20000, 50000));
}
sw.Stop();
}
Console.WriteLine("List<T> native method: element does not exist:Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt2(count);
sw.Start();
for (int j = 0; j < count; j++)
{
//make sure that the element is not in the list
DoContainsEnumerable(list, r.Next(20000, 50000));
}
sw.Stop();
}
Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
Console.ReadKey();
}
static List<int> CreateListOfInt(int count)
{
Random r = new Random(1000);
List<int> numbers = new List<int>(count);
for (int i = 0; i < count; i++)
{
numbers.Add(r.Next());
}
return numbers;
}
static bool DoContains(List<int> list, int number)
{
return list.Contains(number);
}
static bool DoContainsEnumerable(IEnumerable<int> list, int number)
{
return list.Contains(number);
}
//define the scope of randomly created number, to make sure that lookup number will not in the List
static List<int> CreateListOfInt2(int count)
{
Random r = new Random(1000);
List<int> numbers = new List<int>(count);
for (int i = 0; i < count; i++)
{
numbers.Add(r.Next(0,10000));
}
return numbers;
}
}
Edit: I tried HashSet implementation, which greatly increases performance:
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt2(count);
HashSet<int> hashtable = new HashSet<int>(list);
sw.Start();
for (int j = 0; j < count; j++)
{
//make sure that the element is not in the list
hashtable.Contains(r.Next(20000, 50000));
}
sw.Stop();
}
Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
Still, what is your opinion about my mentor saying?
Can anyone clear out for me? Is my mentor right? If he’s right, what is wrong with my code?
Thank you very much
List<T>Containscalls are just iterating the list, so they won’t be faster than the extension method. If you were to use aHashSet<T>and try a series ofContains()operations, you would find a marked improvement.Edit: the reason Microsoft didn’t utilize a hash for the
IEnumerable<T>extension methods is they could not guarantee that the implementing class used a hash or something similar. They had to go with the naive approach because theIEnumerable<T>interface only guarantees that the implementing class be enumerated.