I’m trying to do a query that can return me results where the most of the 5 conditions match. But if there is a single 5 of 5 match, then that takes precedence.
To illustrate my question the followig SQL has been prepared.
declare @tmp table (
id int identity
,field1 nvarchar(60)
,field2 nvarchar(60)
,field3 nvarchar(60)
,field4 nvarchar(60)
,field5 nvarchar(60)
)
insert into @tmp values
('Bob','Jones','Mr','000001','bob@example.com')
insert into @tmp values
('Bill','Jones','','000002','bill@example.com')
insert into @tmp values
('Sue','Jones','Mrs','000003','jones@example.com')
insert into @tmp values
('Adrian','Jones','','000001','jones@example.com')
insert into @tmp values
('Bertha','Jones','Mrs','000001','jones@example.com')
select *
from @tmp
declare @key1 nvarchar(60), @key2 nvarchar(60), @key3 nvarchar(60), @key4 nvarchar(60), @key5 nvarchar(60)
select
@key1 = 'Bertha'
,@key2 = 'Jones'
,@key3 = 'Mrs'
,@key4 = '000001'
,@key5 = 'jones@example.com'
select
*
,case when field1 = @key1 then 1 else 0 end as X1
,case when field2 = @key2 then 1 else 0 end as X2
,case when field3 = @key3 then 1 else 0 end as X3
,case when field4 = @key4 then 1 else 0 end as X4
,case when field5 = @key5 then 1 else 0 end as X5
from @tmp
If you look at the results, you can see several Rows 3 and 4 matches on 3 fields, but Row 5 matches on 5 fields. Therefore this is an identical match and that’s the one I want returned.
But, if Row 5 wasn’t inserted, then Rows 3 and 4 are the best matches, in that case, I’d want those returned.
I’ve been trying to think how best to do this, I am using SQL Server 2008 if that can make any difference.
In the real scenario, they are not all simple case statements as in this example, but sub-selects into other tables.
I’ve looked into group by, and having, but I couldn’t see how I could use them in this scenario.
How can I do a ‘best of’ type match across multiple conditions like this in SQL Server?
If this appears ‘subjective’ as the page is telling me, say so and I’ll delete it. But I don’t think this is subjective as this is a SQL de-duplication scenario that I’d imagine is commonly requested.
Consider rolling your
xcolumns into a SUM to get the score. Here’s a CTE (Common Table Expression) from which you can query.