Collation under the Unicode Technical Standard #10 (UCA), which is a separate thing from

Question

0

Asked: May 29, 20262026-05-29T17:19:17+00:00 2026-05-29T17:19:17+00:00

Collation under the Unicode Technical Standard #10 (UCA), which is a separate thing from

0

Collation under the Unicode Technical Standard #10 (UCA), which is a separate thing from being Unicode Compliant, in case you were wondering about that, implies not only ordering/sorting but also comparison, questions of “is string 1 equal to string 2”. Sometimes code points which are not the same value in both strings are to be considered equal for collation and comparison purposes, at least that is implied by this blog post which is talking from a Perl standard library perspective.

What I want to know is, does (a) Delphi XE2 already fully implement the entire Unicode Collation Spec, and (b) if not, does a third party library do so?

Sample code:

Str1 := Chr($212B);
Str2 := Chr($C5);
n := CompareStr(Str1,Str2); // in delphi this is not zero, under UCA rules, should be 0.

According to the Unicode collation spec, Unicode collation should consider all the above codepoints equivalent under comparison. That makes no sense from a binary point of view, and so I’m glad that neither CompareStr in Delphi, nor cmp in perl (from the linked article) are polluted with Unicode glitches, but what if you want to do a unicode-compliant collation in Delphi, like the perl Unicode::Collation library? How?

Update AnsiCompareStr would call the Win32 CompareString and would handle some locale specific cases like the above, and from reading around the internet, the classic Windows unicode collation behaviour and UCA are converging slowly but not completely, with UCA seeming to be the one that gets changed to make it more like Windows collation.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T17:19:21+00:00

(a) No. Delphi’s AnsiCompareStr and co. wrap the Win32 CompareString function, which does not follow the Unicode collation algorithm.

(b) The ICU project does support it, but the Delphi wrapper, ICU4PAS, hasn’t been updated since 2007.

That may not be necessary for you though. The reason you’re seeing the behavior you are is because you’re using CompareStr instead of AnsiCompareStr. The non-ANSI version is written in asm in SysUtils, compares char-by-char, and doesn’t take equivalence or combining characters into account. The case insensitive version, CompareText, also only works with a-z. The ANSI versions call CompareString internally which is locale-aware and does handle all of those cases.

Note that that’s only true for the routines in SysUtils though. In StrUtils.pas the non-ANSI versions are just inline wrappers around the ANSI ones, so they are all locale aware.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Collation under the Unicode Technical Standard #10 (UCA), which is a separate thing from

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply