I’m using ICU with Lithuanian ( lt_LT ) language. The alphabet for this language

Question

0

Asked: June 4, 20262026-06-04T05:07:07+00:00 2026-06-04T05:07:07+00:00

I’m using ICU with Lithuanian ( lt_LT ) language. The alphabet for this language

0

I’m using ICU with Lithuanian (lt_LT) language. The alphabet for this language is the following: a ą b c č d e ę ė <...> v z ž

However, when sorting, ICU’s collator assumes that, for example, a and ą (a with ogonek) are equivalent, so a list of Lithuanian words get sorted as this:

a, ą, ab, aba, abadas, <...>, b, ba, <...>`

When the expected result would be:

a, ab, aba, abadas, <...>, ą, <...>, b, ba, <...>

The same happens with other “accented” letters (e – ę – ė, z – ž, etc.)

More specific test case: running source/samples/coll/coll -locale lt_LT -source ą -target aa decides that source is less than target when it’s not the case (see coll.cpp if you need to).

Is this behavior expected? Is this a bug or a feature? If so, how can I prevent ICU’s collator from aligning “similar” letters together?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T05:07:08+00:00

Editorial Team

2026-06-04T05:07:08+00:00Added an answer on June 4, 2026 at 5:07 am

The letters are listed as a secondary difference in the CLDR tailorings and so they will sort like so. If this is wrong, bring it up to CLDR, not an ICU problem. Mimer agrees.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using ICU with Lithuanian ( lt_LT ) language. The alphabet for this language

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply