Following line of code may help you to achieve your…

Question

0

Asked: May 14, 20262026-05-14T21:22:56+00:00 2026-05-14T21:22:56+00:00

We are testing our application for Unicode compatibility and have been selecting random characters

0

We are testing our application for Unicode compatibility and have been selecting random characters outside the Latin character set for testing.

On both Latin and Japanese-collated systems the following equality is true (U+3422):

N'㐢㐢㐢㐢' = N'㐢㐢㐢'

but the following is not (U+30C1):

N'チチチチ' = N'チチチ'

This was discovered when a test case using the first example (using U+3422) violated a unique index. Do we need to be more selective about the characters we use for testing? Obviously we don’t know the semantic meaning of the above comparisons. Would this behavior be obvious to a native speaker?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T21:22:57+00:00

Michael Kaplan has a blog post where he explains how Unicode strings are compared. It all comes down to the point that a string needs to have a weight, if it doesn’t it will be considered equal to the empty string.

Sorting it all Out: The jury will give this string no weight

In SQL Server this weight is influenced by the defined collation. Microsoft has added appropriate collations for CJK Unified Ideographs in Windows XP/2003 and SQL Server 2005. This post recommends to use Chinese_Simplified_Pinyin_100_CI_AS or Chinese_Simplified_Stroke_Order_100_CI_AS:

You can always use any binary and binary2 collations although it wouldn’t give you Linguistic correct result. For SQL Server 2005, you SHOULD use Chinese_PRC_90_CI_AS or Chinese_PRC_Stoke_90_CI_AS which support surrogate pair comparison (but not linguistic). For SQL Server 2008, you should use Chinese_Simplified_Pinyin_100_CI_AS and Chinese_Simplified_Stroke_Order_100_CI_AS which have better linguistic surrogate comparison. I do suggest you use these collation as your server/database/table collation instead of passing the collation name during comparison.

So the following SQL statement would work as expected:

select * from MyTable where N'' = N'㐀' COLLATE Chinese_Simplified_Stroke_Order_100_CI_AS;

A list of all supported collations can be found in MSDN:

SQL Server 2008 Books Online: Windows Collation Name

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions