I have a SQL table containing customer information (we’ll call it CustomerDB), including an address column. Many of the rows in this table have exact duplicate addresses. The business requirement is to retrieve rows from the database such that:
- If there are more then three exact duplicates, only take three (ignore the rest)
- Which three is not important; any three will do (it does not have to be the "first three").
So, if I have a data set like this (I have also entered this in the Data Explorer, to make it easy to test queries against):
ID NAME COMPANY ADDR1
16242 TOM E Company A 101 First RD
16241 RONALD J Company B 12 Tenth AVE
16235 KENNETH H 12 Tenth AVE
16238 MICHAEL H Company C 12 Tenth AVE
16243 ANTONIO D Company D 264 Long ST STE 5
16237 MICHAEL B Company E 264 Long ST STE 5
16234 WALTER L Company F 73 North RD
16236 CARL O Company G 73 North RD
16239 MICHAEL S Company H 73 North RD
16240 MICHAEL I Company I 73 North RD
I would like to get all of those rows except for one of the "73 North RD" records. I hope I’m making some sense here.
My mind does not work very well in set-based operations, so I’m confused about how to do this. I would prefer a solution with an explanation of why it works. Since I’d like to be "taught how to fish", so to speak =)
Additional Information:
IDis an int primary key (auto-incremented)- All other columns are text.
- Sometimes I get these datasets as Access DBs, sometimes they are in SQL Server. So, I’d prefer a solution that works in both (i.e. does not use
CROSS APPLYorCTE‘s)
A possible solution could be
The magic (nice definition you gave) is done by
GROUP BY ... HAVING ...part.Sure you know
WHEREis used to filter data while selecting; well,HAVINGis used in conjunction withGROUP BYto filter grouped data (so after grouping).So I first group data (order is important) by address and then limit the count for each group to three.
Now let’s talk about the inner part (you didn’t ask this, but I thought it was good to provide you a complete example).
The
ONpart is used to tell the engine which fields are to be used to join tables; here I match tables on address and then I take records having an higher id (so, sorting ascending).