I’m using the following queries to detect duplicates in a database. Using a LINQ

Question

0

Asked: June 13, 20262026-06-13T20:58:09+00:00 2026-06-13T20:58:09+00:00

I’m using the following queries to detect duplicates in a database. Using a LINQ

0

I’m using the following queries to detect duplicates in a database.

Using a LINQ join doesn’t work very well because Company X may also be listed as CompanyX, therefore I’d like to amend this to detect “near duplicates”.

var results = result
                .GroupBy(c => new {c.CompanyName})
                .Select(g => new CompanyGridViewModel
                    {
                        LeadId = g.First().LeadId,
                        Qty = g.Count(),
                        CompanyName = g.Key.CompanyName,
                    }).ToList();

Could anybody suggest a way in which I have better control over the comparison? Perhaps via an IEqualityComparer (although I’m not exactly sure how that would work in this situation)

My main goals are:

To list the first record with a subset of all duplicates (or “near duplicates”)
To have some flexibility over the fields and text comparisons I use for my duplicates.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T20:58:10+00:00

For your explicit “ignoring spaces” case, you can simply call

var results = result.GroupBy(c => c.Name.Replace(" ", ""))...

However, in the general case where you want flexibility, I’d build up a library of IEqualityComparer<Company> classes to use in your groupings. For example, this should do the same in your “ignore space” case:

public class CompanyNameIgnoringSpaces : IEqualityComparer<Company>
{
    public bool Equals(Company x, Company y)
    {
        return x.Name.Replace(" ", "") == y.Name.Replace(" ", "");
    }

    public int GetHashCode(Company obj)
    {
        return obj.Name.Replace(" ", "").GetHashCode();
    }
}

which you could use as

var results = result.GroupBy(c => c, new CompanyNameIgnoringSpaces())...

It’s pretty straightforward to do similar things containing multiple fields, or other definitions of similarity, etc.

Just note that your defintion of “similar” must be transitive, e.g. if you’re looking at integers you can’t define “similar” as “within 5”, because then you’d have “0 is similar to 5” and “5 is similar to 10” but not “0 is similar to 10”. (It must also be reflexive and symmetric, but that’s more straightforward.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using the following queries to detect duplicates in a database. Using a LINQ

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply