The problem We have a table of duplicate customer numbers: A varchar(16) NOT NULL,

Question

0

Editorial Team

Asked: May 25, 20262026-05-25T20:47:20+00:00 2026-05-25T20:47:20+00:00

The problem We have a table of duplicate customer numbers: A varchar(16) NOT NULL,

0

The problem

We have a table of duplicate customer numbers:

A varchar(16) NOT NULL,
B varchar(16) NOT NULL

These columns started off as Old and New (Delete and Retain), but devolved to where neither is preferred. The columns really are just “A” and “B” — two numbers for the same customer, in any order.

Furthermore, the table can have an arbitrary number of pairs for the same customer. You might see rows like

a,b
b,c

meaning a,b,c are all for the same customer. You might also see rows like

a,b
b,a
c,a

meaning a,b,c are all the same customer.

It’s not a clean acyclic representation like “old” and “new” values. The list of customer IDs for a customer is represented in this table in chunks of one or more rows, where the only connection is that the value for A or B column in one row might show up in the A or B column in some other row. My mission is to tie them all together into the list for each customer.

I want to convert this mess to something like

MasterKey int NOT NULL,
CustNum varchar(16) NOT NULL UNIQUE,
PRIMARY KEY( MasterKey, CustNum )

The one or more numbers for a customer would share the MasterKey in this table. As the UNIQUE constraint says, a given CustNum can’t appear more than once.

So for example, rows like this from the original

1a,1b
1b,1c
2a,2b
2b,2c
2d,2a
...

should end up as rows like this in the new table

1 1a
1 1b
1 1c
2 2a
2 2b
2 2c
2 2d
...

Edit: The values above are just to make the pattern clear. The actual customer number values are arbitrary varchars.

My attempted solutions

This feels like a job for recursion and therefore a CTE. But the potentially cyclic nature of the source data makes it hard for me to get the anchor case. I’ve tried to pre-clean it into more of an acyclic form, but I still can’t seem to get this right.

I’m also stubbornly trying to do this as a set-based SQL operation, instead of resorting to a cursor and loop. But maybe that’s not possible.

I’ve spent a good 8 hours pondering this and trying different approaches but it keeps slipping away. Any ideas or suggestions on the correct approach, or even some example code?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T20:47:21+00:00

I’m going to do something I haven’t done before, and post an answer to
my own question. I need to give huge thanks to both Beth and JBrooks
for moving me in the right direction. I really wanted to solve this
in a set-based, declarative way. And maybe that’s still possible using
a CTE and recursion. But once I surrendered and said it’s OK for it to
be imperative and iterative, it was much easier to do it.

Anyway, given this target table from my question:

CREATE TABLE UniqueCustomers
(
    uid     int NOT NULL,
    gpid    varchar(16) NOT NULL UNIQUE, -- Important: UNIQUE to disallow duplicates
    PRIMARY KEY( uid, gpid ) -- Important: Disallow duplicates
)

I came up with the following stored procedure. It can be called when
new dupes are reported, one by one. It can also be called in a loop
over the legacy table that stores the dupes as pairs in a random
order.

CREATE PROCEDURE ReportDuplicateCustomerIDs
(
    @id1 varchar(16),
    @id2 varchar(16)
)
AS
BEGIN
    IF @id1 <> @id2
    BEGIN
        -- Retrieve the uid (if any) for each of the ids
        DECLARE @uid1 int
        SELECT @uid1 = NULL
        SELECT @uid1 = uid FROM UniqueCustomers WHERE gpid = @id1

        DECLARE @uid2 int
        SELECT @uid2 = NULL
        SELECT @uid2 = uid FROM UniqueCustomers WHERE gpid = @id2

        -- If we've seen NEITHER of the id's yet
        IF @uid1 IS NULL AND @uid2 IS NULL
        BEGIN
            -- Add both of them using a brand-new uid
            DECLARE @uidNew int
            SELECT @uidNew = Max(uid) + 1 FROM UniqueCustomers
            IF @uidNew IS NULL
                SET @uidNew = 0
            INSERT INTO UniqueCustomers VALUES( @uidNew, @id1 )
            INSERT INTO UniqueCustomers VALUES( @uidNew, @id2 )
        END
        ELSE
        BEGIN
            -- If we've seen BOTH id's already
            IF @uid1 IS NOT NULL AND @uid2 IS NOT NULL
            BEGIN
                -- If this pair bridges two existing chains.
                IF @uid1 <> @uid2
                BEGIN
                    -- Update everything using uid2 to use uid1 instead.
                    -- Consolidates the two dupe chains into one.
                    UPDATE UniqueCustomers SET uid = @uid1 WHERE uid = @uid2
                END
                -- ELSE nothing to do
            END
            ELSE
                -- If we've seen only id1, then insert id2 using
                -- the same uid that id1 is already using
                IF @uid1 IS NOT NULL
                    INSERT INTO UniqueCustomers VALUES( @uid1, @id2 )
                -- If we've seen only id2, then insert id1 using
                -- the same uid that id2 is already using
                ELSE -- @uid2 IS NOT NULL
                    INSERT INTO UniqueCustomers VALUES( @uid2, @id1 )
        END
    END
END
GO

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

The problem We have a table of duplicate customer numbers: A varchar(16) NOT NULL,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply