I need to write a query that will go through a table, find duplicate pairs of data, remove one of them, and then update any tables that were pointing to the removed item to point to the one I am keeping.
I have two issues with this. The first being that I am not to hot on procedural SQL, so don’t really know what structures etc would be best to accomplish this.
The second is where it gets tricky. The table holds addresses, each address has the id of the customer it belongs too. A customer can have many address. The customer must have at least one residential address. So if a customer has 2 addresses, this is rather simple, check if they are the same, if so, keep one that is a residential, remove other, update links, in this case i just need to know the above.
But what about when the customer has more then one address, how do you go about checking the pairs one by one. I know you can join the table on itself, but how do you decide which to keep, as going through all the pairs will delete all the addresses if they are all identical!!
Then there is the case of when one residential address exists, and two identical postal address, I need to remove one of the postal addresses, but still check that a residential address exists for the customer.
I realise this is a lot of corner cases, so if anyone can simply help with the structures to use to make these checks, that would be immensly helpful!
For you visual people, assume the data looks like this:
ID CustID Address Type
1 25 123 St R
2 36 567 Rd R
3 36 567 Rd R < should be removed
4 36 567 Rd P < should be removed
5 25 99 Lane P
6 25 99 Lane P < should be removed
7 25 66 Way P
You should be able to do something similar to what you want using two queries;
Demo here.