I’m using SQL Server 2008. I have a table
Customers
customer_number int
field1 varchar
field2 varchar
field3 varchar
field4 varchar
… and a lot more columns, that don’t matter for my queries.
Column customer_number is pk. I’m trying to find duplicate values and some differences between them.
Please, help me to find all rows that have same
1) field1, field2, field3, field4
2) only 3 columns are equal and one of them isn’t (except rows from list 1)
3) only 2 columns equal and two of them aren’t (except rows from list 1 and list 2)
In the end, I’ll have 3 tables with this results and additional groupId, which will be same for a group of similar (For example, for 3 column equals, rows that have 3 same columns equal will be a separate group)
Thank you.
The easiest would probably be to write a stored procedure to iterate over each group of customers with duplicates and insert the matching ones per group number respectively.
However, I’ve thought about it and you can probably do this with a subquery. Hopefully I haven’t made it more complicated than it ought to, but this should get you what you’re looking for for the first table of duplicates (all four fields). Note that this is untested, so it might need a little tweaking.
Basically, it gets each group of fields where there are duplicates, a group number for each, then gets all customers with those fields and assigns the same group number.
The other ones are a bit more complicated, however as you’ll need to expand out the possibilities. The three-field groups would then be:
Hopefully this produces the right results and I’ll leave the last one as an exercise. 😀