For some reason, it seems the Add operation on a HashSet is slower than the Contains operation when the element already exists in the HashSet.
Here is proof:
Stopwatch watch = new Stopwatch(); int size = 10000; int iterations = 10000; var s = new HashSet<int>(); for (int i = 0; i < size; i++) { s.Add(i); } Console.WriteLine(watch.Time(() => { for (int i = 0; i < size; i++) { s.Add(i); } }, iterations)); s = new HashSet<int>(); for (int i = 0; i < size; i++) { s.Add(i); } // outputs: 47,074,764 Console.WriteLine(watch.Time(() => { for (int i = 0; i < size; i++) { if (!s.Contains(i)) s.Add(i); } }, iterations)); // outputs: 41,125,219
Why is Contains faster than Add for already-existing elements?
Note: I’m using this Stopwatch extension from another SO question.
public static long Time(this Stopwatch sw, Action action, int iterations) { sw.Reset(); sw.Start(); for (int i = 0; i < iterations; i++) { action(); } sw.Stop(); return sw.ElapsedTicks; }
UPDATE: Internal testing has revealed that the big performance diff only happens on the x64 version of the .NET framework. With the 32 bit version of the framework Contains seems run at identical speed to add (in fact it appears that the version with the contains runs a percent slower in some test runs) On X64 versions of the framework, the version with the contains seems to run about 15% faster.
AddIfNotPresent does an additional divide that Contains doesn’t perform. Take a look at the IL for Contains:
This is computing the bucket location for the hash code. The result is saved at local memory location 1.
AddIfNotPresent does something similar, but it also saves the computed value at location 2, so that it can insert the item into the hash table at that position if the item doesn’t exist. It does that save because one of the locations is modified later in the loop that goes looking for the item. Anyway, here’s the relevant code for AddIfNotPresent:
Anyway, I think the extra divide is what’s causing Add to take more time than Contains. At first glance, it looks like that extra divide could be factored out, but I can’t say for sure without spending a little more time deciphering the IL.