I don’t really know what to call this but it’s not that hard to

Question

0

Editorial Team

Asked: May 21, 20262026-05-21T22:26:57+00:00 2026-05-21T22:26:57+00:00

I don’t really know what to call this but it’s not that hard to

0

I don’t really know what to call this but it’s not that hard to explain

Basically what I have is a result like this

Similarity ColumnA   ColumnB   ColumnC
1          SomeValue NULL      SomeValue
2          NULL      SomeB     NULL
3          SomeValue NULL      SomeC
4          SomeA     NULL      NULL

This result is created by matching a set of strings against another table. Each string also contains some values for these ColumnA..C which are the values I wan’t to aggregate in some way.

Something like min/max works very well but I can’t figure out how to get it to account for the highest similarity not just the min/max value. I don’t really want the min/max, I want the first non-null value with the highest similarity.

Ideally the result would look like this

ColumnA   ColumnB   ColumnC
SomeA     SomeB     SomeC

I’d like be able to efficiently join in the temporary result to compute the rest and I’ve been exploring different options. Something which I’ve been considering is creating a SQL Server CLR aggregate the yields the “first” non-null value but I’m unsure if there’s even such a thing as a first or last when running an aggregate on a result.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T22:26:58+00:00

Okay, so I figured it out, I originally had trouble with the UPDATE FROM and JOIN not playing well together. I was counting on that the UPDATE would just occur multiple times and that would give me the correct results, however, there’s no such guarantee from SQL Server (it’s actually undefined behavior and alltough it appeared to work we’ll have none of that) but since you can run UPDATE against a CTE I combined that with the OUTER APPLY to select the exactly 1 row to complement a missing value if possible.

Here’s the whole thing with test data as well.

DECLARE @cost TABLE (
    make nvarchar(100) not null,
    model nvarchar(100),
    a numeric(18,2),
    b numeric(18,2)
);

INSERT @cost VALUES ('a%', null, 100, 2);
INSERT @cost VALUES ('a%', 'a%', 149, null);
INSERT @cost VALUES ('a%', 'ab', 349, null);
INSERT @cost VALUES ('b', null, null, 2.5);
INSERT @cost VALUES ('b', 'b%', 249, null);
INSERT @cost VALUES ('b', 'b', null, 3);

DECLARE @unit TABLE (
    id int,
    make nvarchar(100) not null,
    model nvarchar(100)
);

INSERT @unit VALUES (1, 'a', null);
INSERT @unit VALUES (2, 'a', 'a');
INSERT @unit VALUES (3, 'a', 'ab');
INSERT @unit VALUES (4, 'b', null);
INSERT @unit VALUES (5, 'b', 'b');

DECLARE @tmp TABLE (
    id int,
    specificity int,
    a numeric(18,2),
    b numeric(18,2),
    primary key(id, specificity)
);

INSERT @tmp 
OUTPUT inserted.* --FOR DEBUGGING
SELECT 
    unit.id
    , ROW_NUMBER() OVER (
        PARTITION BY unit.id 
        ORDER BY cost.make DESC, cost.model DESC
    ) AS specificity
    , cost.a
    , cost.b
FROM @unit unit
INNER JOIN @cost cost ON unit.make LIKE cost.make
    AND (cost.model IS NULL OR unit.model LIKE cost.model)
;

--fix the holes
WITH tmp AS (
    SELECT * 
    FROM @tmp 
    WHERE specificity = 1 
        AND (a IS NULL OR b IS NULL) --where necessary
)
UPDATE tmp
SET 
    tmp.a = COALESCE(tmp.a, a.a)
    , tmp.b = COALESCE(tmp.b, b.b)
OUTPUT inserted.* --FOR DEBUGGING
FROM tmp
OUTER APPLY ( 
    SELECT TOP 1 a 
    FROM @tmp a 
    WHERE a.id = tmp.id 
        AND a.specificity > 1 
        AND a.a IS NOT NULL 
    ORDER BY a.specificity
    ) a
OUTER APPLY ( 
    SELECT TOP 1 b 
    FROM @tmp b 
    WHERE b.id = tmp.id 
        AND b.specificity > 1 
        AND b.b IS NOT NULL 
    ORDER BY b.specificity
    ) b
;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I don’t really know what to call this but it’s not that hard to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply