This query:
select *
from op.tag
where tag = 'fussball'
Returns a result which has a tag column value of “fußball”. Column “tag” is defined as nvarchar(150).
While I understand they are similar words grammatically, can anyone explain and defend this behavior? I assume it is related to the same collation settings which allow you to change case sensitivity on a column/table, but who would want this behavior? A unique constraint on the column also causes failure on inserts of one value when the other exists due to a constraint violation. How do I turn this off?
Follow-up bonus point question. Explain why this query does not return any rows:
select 1
where 'fußball' = 'fussball'
Bonus question (answer?): @ScottCher pointed out to me privately that this is due to the string literal “fussball” being treated as a varchar. This query DOES return a result:
select 1
where 'fußball' = cast('fussball' as nvarchar)
But then again, this one does not:
select 1
where cast('fußball' as varchar) = cast('fussball' as varchar)
I’m confused.
I guess the Unicode collation set for your connection/table/database specifies that ss == ß. The latter behavior would be because it’s on a faulty fast path, or maybe it does a binary comparison, or maybe you’re not passing in the ß in the right encoding (I agree it’s stupid).
http://unicode.org/reports/tr10/#Searching mentions that U+00DF is special-cased. Here’s an insightful excerpt: