after reading this very interesting thread on duplicate removal, i ended with this =>
public static IEnumerable<T> deDuplicateCollection<T>(IEnumerable<T> input)
{
var hs = new HashSet<T>();
foreach (T t in input)
if (hs.Add(t))
yield return t;
}
by the way, as i’m brand new to C# and coming from Python, i’m a bit lost between casting and this kind of thing… i was able to compile and build with :
foreach (KeyValuePair<long, List<string>> kvp in d)
{
d[kvp.Key] = (List<string>) deDuplicateCollection(kvp.Value);
}
but i must have missed something here… as i get a “System.InvalidCastException” @ runtime, maybe could you point interesting things about casting and where i’m wrong? Thank you in advance.
First, about the usage of the method.
Drop the cast, invoke
ToList()on the result of the method. The result of the method isIEnumerable<string>, this is not aList<string>. The fact the source is originally aList<string>is irrelevant, you don’t return the list, youyield returna sequence.Second, your
deDuplicateCollectionmethod is redundant,Distinct()already exists in the library and performs the same function.Just be sure you have a
using System.Linq;in the directives so you can use theseDistinct()andToList()extension methods.Finally, you’ll notice making this change alone, you run into a new exception when trying to change the dictionary in the loop. You cannot update the collection in a
foreach. The simplest way to do what you want is to omit the explicit loop entirely. ConsiderThis uses another Linq extension method,
ToDictionary(). Note: this creates a new dictionary in memory and updatesdto reference it. If you need to preserve the original dictionary as referenced byd, then you would need to approach this another way. A simple option here is to build a dictionary to shadowd, and then updatedwith it.These two loops are safe, but you see you need to loop twice to avoid the problem of updating the original collection while enumerating over it while also preserving the original collection in memory.