In my database i have translation table which contains dictionary for converting unusual Unicode characters to english character. Unicode character is primary key of this table.
And some time ago i faced an issue: some different Unicode characters are the same for T-SQL and they are equal to nothing at the same time.
I can find the way do distinguish one from another (‘=’ is useless) and even managed to insert one of them into database. But i can`t insert more then one because of primary key constrain while they all are equal.
I discovered just 4: Ș ș Ț ț. But 4 is just enough to spoil my system.
And this is the short but informative example of how do they behave:
DECLARE @Strings TABLE(id int, ucode nvarchar(50))
INSERT INTO @Strings (id, ucode)
SELECT 1, N'A' UNION -- Usual char
SELECT 2, N'Ы' UNION -- Some unicode char
SELECT 3, N'Ф' UNION -- Another unicode char
SELECT 5, N' ' UNION -- space
SELECT 6, N'Ș' UNION -- Unusual unicode char
SELECT 7, N'Ț' UNION -- Unusual unicode char
SELECT 8, N'some_string' UNION -- example string
SELECT 9, N'some_string ' UNION -- example string with space
SELECT 10, N'some_string Ș' UNION -- example string with unusual char
SELECT 11, N'some_string Ț' -- one more
SELECT * FROM @Strings
SELECT * FROM @Strings WHERE ucode = N'A' -- Good one (1 result)
SELECT * FROM @Strings WHERE ucode = N'Ș' -- Magic (3 results)
SELECT * FROM @Strings WHERE ucode = N'Ț' -- Magic (3 results)
SELECT * FROM @Strings WHERE ucode = '' -- Magic (3 results)
SELECT * FROM @Strings WHERE ucode = 'some_string' -- Magic (4 results)
Do you have any suggestion?
=isn’t useless, but you need to specify how=should compare. The default is set at the database level, and your column does not specify different comparison rules, so your column gets the database’s comparison rules. Primary keys use the same comparison rules as=, so fixing it for one also makes the other work as you intend it to.Specifying the comparison rules is done with the
COLLATEkeyword. One collation that should treat all code points as distinct characters isLatin1_General_BIN2.