I hate posting this since it’s somewhat subjective, but it feels like there’s a better method to do this that I’m just not thinking of.
Sometimes I want to ‘distinct’ a collection by certain columns/properties but without throwing away other columns (yes, this does lose information, as it becomes arbitrary which values of those other columns you’ll end up with).
Note that this extension is less powerful than the Distinct overloads that take an IEqualityComparer<T> since such things could do much more complex comparison logic, but this is all I need for now 🙂
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> getKeyFunc)
{
return from s in source
group s by getKeyFunc(s) into sourceGroups
select sourceGroups.First();
}
Example usage:
var items = new[]
{
new { A = 1, B = "foo", C = Guid.NewGuid(), },
new { A = 2, B = "foo", C = Guid.NewGuid(), },
new { A = 1, B = "bar", C = Guid.NewGuid(), },
new { A = 2, B = "bar", C = Guid.NewGuid(), },
};
var itemsByA = items.DistinctBy(item => item.A).ToList();
var itemsByB = items.DistinctBy(item => item.B).ToList();
Here you go. I don’t think this is massively more efficient than your own version, but it should have a slight edge. It only requires a single pass through the sequence,
yielding each item as it goes, rather than needing to group the entire sequence first.