I don’t really know what to call this but it’s not that hard to explain
Basically what I have is a result like this
Similarity ColumnA ColumnB ColumnC
1 SomeValue NULL SomeValue
2 NULL SomeB NULL
3 SomeValue NULL SomeC
4 SomeA NULL NULL
This result is created by matching a set of strings against another table. Each string also contains some values for these ColumnA..C which are the values I wan’t to aggregate in some way.
Something like min/max works very well but I can’t figure out how to get it to account for the highest similarity not just the min/max value. I don’t really want the min/max, I want the first non-null value with the highest similarity.
Ideally the result would look like this
ColumnA ColumnB ColumnC
SomeA SomeB SomeC
I’d like be able to efficiently join in the temporary result to compute the rest and I’ve been exploring different options. Something which I’ve been considering is creating a SQL Server CLR aggregate the yields the “first” non-null value but I’m unsure if there’s even such a thing as a first or last when running an aggregate on a result.
Okay, so I figured it out, I originally had trouble with the
UPDATE FROMandJOINnot playing well together. I was counting on that theUPDATEwould just occur multiple times and that would give me the correct results, however, there’s no such guarantee from SQL Server (it’s actually undefined behavior and alltough it appeared to work we’ll have none of that) but since you can runUPDATEagainst a CTE I combined that with theOUTER APPLYto select the exactly 1 row to complement a missing value if possible.Here’s the whole thing with test data as well.