Can anyone help with a SQL problem Im having whereby I need to merge

Question

0

Asked: June 9, 20262026-06-09T17:32:03+00:00 2026-06-09T17:32:03+00:00

Can anyone help with a SQL problem Im having whereby I need to merge

0

Can anyone help with a SQL problem Im having whereby I need to merge n number of rows into one record. The individual records may or may not have fields populated that others do.

Basically I have an issue where duplicate records have been created in SQL. Some contain information others dont. I need to merge them (I can rank them), updating a field if the value doesn’t exist in the former record (starting with highest ranked first).

For instance, if I have two user records, one has last name populated, the other has first name. These are duplicates and need to be merged into one record, like a coalesce. However, the are n number of rows.

Its essentially a transpose of many records into one, where a field is updated only if a duplicate record lower in rank has that field populated and the field doesn’t exist in the higher ranked row.

Here is a very simplified version of the issue. As you can see, using SQL Fiddle the script creates 6 records. These records should be merged into 2 records and all the fields filled in.

The problem is, there could be x number of rows. I cannot use a coalesce statement as there is variance to the number of rows.

Hope that make sense?

CREATE TABLE [dbo].[Employee]([EmployeeId] varchar(10) NULL,
[First Name] [varchar](30) NULL,
[Middle Name] [varchar](30) NOT NULL,
[Last Name] [varchar](30) NOT NULL,
[E-Mail] [varchar](80) NOT NULL)


insert into Employee(EmployeeId,[First Name],[Middle Name],[Last Name],[E-Mail])
values('BOB1','Bob','','','bob@hotmail.com');

insert into Employee(EmployeeId,[First Name],[Middle Name],[Last Name],[E-Mail])
values('BOB1','','John','','bob@hotmail.com');

insert into Employee(EmployeeId,[First Name],[Middle Name],[Last Name],[E-Mail])
values('BOB1','','','Smith','bob@hotmail.com');

insert into Employee(EmployeeId,[First Name],[Middle Name],[Last Name],[E-Mail])
values('MARK1','','Peter','','mark@hotmail.com');

insert into Employee(EmployeeId,[First Name],[Middle Name],[Last Name],[E-Mail])
values('MARK1','Mark','','','mark@hotmail.com');

insert into Employee(EmployeeId,[First Name],[Middle Name],[Last Name],[E-Mail])
values('MARK1','','','Davis','mark@hotmail.com');


select * from [Employee]

Hope that makes sense.

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T17:32:05+00:00

If the performance is important enough to justify a couple of hours of coding and you are allowed to use SQLCLR, you can calculate all values in a single table scan with multi-parameter User Defined Aggregare.

Here’s an example of an aggregate that returns lowest-ranked non-NULL string:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using Microsoft.SqlServer.Server;

[Serializable]
[SqlUserDefinedAggregate(Format.UserDefined, MaxByteSize = -1, IsNullIfEmpty = true)]
public struct LowestRankString : IBinarySerialize
{
    public int currentRank;
    public SqlString currentValue;

    public void Init()
    {
        currentRank = int.MaxValue;
        currentValue = SqlString.Null;
    }

    public void Accumulate(int Rank, SqlString Value)
    {
        if (!Value.IsNull)
        {
            if (Rank <= currentRank)
            {
                currentRank = Rank;
                currentValue = Value;
            }
        }
    }

    public void Merge(LowestRankString Group)
    {
        Accumulate(Group.currentRank, Group.currentValue);
    }

    public SqlString Terminate()
    {
        return currentValue;
    }

    public void Read(BinaryReader r)
    {
        currentRank = r.ReadInt32();
        bool hasValue = r.ReadBoolean();
        if (hasValue)
        {
            currentValue = new SqlString(r.ReadString());
        }
        else
        {
            currentValue = SqlString.Null;
        }
    }

    public void Write(BinaryWriter w)
    {
        w.Write(currentRank);

        bool hasValue = !currentValue.IsNull;
        w.Write(hasValue);
        if (hasValue)
        {
            w.Write(currentValue.Value);
        }
    }
}

Assuming your table looks something like this:

CREATE TABLE TopNonNullRank (
Id INT NOT NULL,
UserId NVARCHAR (32) NOT NULL,
Value1 NVARCHAR (128) NULL,
Value2 NVARCHAR (128) NULL,
Value3 NVARCHAR (128) NULL,
Value4 NVARCHAR (128) NULL,
PRIMARY KEY CLUSTERED (Id ASC)
);

INSERT INTO TopNonNullRank (Id, UserId, Value1, Value2, Value3, Value4) VALUES 
    (1, N'Ada', NULL, N'Top value 2 for A', N'Top value 3 for A', NULL),
    (2, N'Ada', N'Top value 1 for A', NULL, N'Other value 3', N'Top value 4 for A'),
    (3, N'Ada', N'Other value 1 for A', N'Other value 2 for A', N'Other value 3 for A', NULL),
    (4, N'Bob', N'Top value 1 for B', NULL, NULL, NULL),
    (5, N'Bob', NULL, NULL, NULL, N'Top value 4 for B'),
    (6, N'Bob', N'Other value 1 for B', N'Top value 2 for B', NULL, N'Other value 4');

The following simple query returns top non-NULL value for each column.

SELECT 
    UserId,
    dbo.LowestRankString(Id, Value1) AS TopValue1,
    dbo.LowestRankString(Id, Value2) AS TopValue2,
    dbo.LowestRankString(Id, Value3) AS TopValue3,
    dbo.LowestRankString(Id, Value4) AS TopValue4
FROM TopNonNullRank
GROUP BY UserId

The only thing left is merging the results back to the original table. The simplest way would be something like this:

WITH TopValuesPerUser AS
(
    SELECT 
        UserId,
        dbo.LowestRankString(Id, Value1) AS TopValue1,
        dbo.LowestRankString(Id, Value2) AS TopValue2,
        dbo.LowestRankString(Id, Value3) AS TopValue3,
        dbo.LowestRankString(Id, Value4) AS TopValue4
    FROM TopNonNullRank
    GROUP BY UserId
)
UPDATE TopNonNullRank
SET
    Value1 = TopValue1,
    Value2 = TopValue2,
    Value3 = TopValue3,
    Value4 = TopValue4
FROM TopNonNullRank AS OriginalTable
LEFT JOIN TopValuesPerUser ON TopValuesPerUser.UserId = OriginalTable.UserId;

Note that this update still leaves you with duplicate rows, and you would need to get rid of them.

You could also get more fancy and store the results of this query into a temporary table, and then use MERGE statement to apply them to the original table.

Another option would be to store the results in a new table, and then swap it with the original table using sp_rename stored proc.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Can anyone help with a SQL problem Im having whereby I need to merge

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply