Please have a look at the SQL below:
create table ChecksumTest (id int not null identity, string1 varchar(10),
string2 varchar(10), string3 varchar(10), string4 varchar(10), checksumvalue int)
insert into ChecksumTest (string1,string2,string3,string4) values ('Ian', 'Marie', 'Sharon', 'Mark')
insert into ChecksumTest (string1,string2,string3,string4) values ('Steven', 'Robert', 'Amy', 'Andy')
insert into ChecksumTest (string1,string2,string3,string4) values ('Sharon', 'Mark','Ian','Marie')
select distinct checksum1 ^ checksum2 As xor from (
select CHECKSUM(string1,string2) as checksum1, CHECKSUM(string3,string4) as checksum2
from ChecksumTest) As Checksums
The select statement returns two distinct XOR values because row one and row three in the table contain the same values. This is what I expect.
I have run the SELECT statement across about one million rows and the number of distinct XOR values is less than I thought. I realise that CHECKSUM is not always unique, but is it safe to use it like this when two different CHECKSUM strings (with multiple parameters e.g. CHECKSUM (String1, String2) can generate the same XOR value?
Is it safer to concatenate the strings together like this: CHECKSUM (String1 + String2) or perhaps a binary CHECKSUM?
Update
Every combination of four values contains two rows:
Row1: String1, String2, String3,String4
Row 2: String3,String4,String1,Strin2
I only want to return one row for each combination.
My solution was found here: How to do bitwise exclusive OR in sql server between two binary types?. It involved splitting every 8 bytes into binary(8).