Some characters such as the Unicode Character ‘LATIN SMALL LETTER C WITH CARON’ can

Question

0

Asked: May 24, 20262026-05-24T07:48:51+00:00 2026-05-24T07:48:51+00:00

Some characters such as the Unicode Character ‘LATIN SMALL LETTER C WITH CARON’ can

0

Some characters such as the Unicode Character ‘LATIN SMALL LETTER C WITH CARON’ can be encoded as 0xC4 0x8D, but can also be represented with the two code points for ‘LATIN SMALL LETTER C’ and ‘COMBINING CARON’, which is 0x63 0xcc 0x8c.
More info here: http://www.fileformat.info/info/unicode/char/10d/index.htm

I wonder if there is a library which can convert a ‘LATIN SMALL LETTER C’ + ‘COMBINING CARON’ into ‘LATIN SMALL LETTER C WITH CARON’. Or is there a table containing these conversions?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T07:48:53+00:00

Generally, you use Unicode Normalization to do this.

Using UnicodeUtils.nfkc using the gem unicode_utils (https://github.com/lang/unicode_utils) should get you the specific behavior you’re asking for; unicode normalization form kC will use a compatibility decomposition followed by converting the string to a composed form, if available (basically what you asked for by your example). (You may also get close to what you want with normalization form c, sometimes acronymized NFC).

How to replace the Unicode gem on Ruby 1.9? has additional details.

In Ruby 1.8.7, you’d need do gem install Unicode, for which there is a similar function available.

Edited to add: The main reason why you’ll probably want normalization form kC instead of just normalization form C is that ligatures (characters that are squeezed together for historical/typographical reasons) will first be decomposed to the individual characters, which is sometimes desirable if you’re doing lexicographic ordering or searching).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Some characters such as the Unicode Character ‘LATIN SMALL LETTER C WITH CARON’ can

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply