In Unicode, a character can be considered in different "compositions". For example the character

Question

0

Asked: May 24, 20262026-05-24T07:11:29+00:00 2026-05-24T07:11:29+00:00

In Unicode, a character can be considered in different "compositions". For example the character

0

In Unicode, a character can be considered in different "compositions".

For example the character à which codepoint is U+00E0, it’s also composed of two code points: U+0061 combined with the grave accent U+0300.

Which left the question of:

What depends when a character ends up been considered in a specific composition?
I mean: The Keyboard? Encoding? Copy-Pasted Text?

I know the way to be aware of with the \X metacharacter, but I would like that someone explain my wondering.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T07:11:31+00:00

It’s ultimately up to the operating system which code point(s) they store when you hit a key, although there is convention in the form of the normalized forms (specifically NFC):

http://en.wikipedia.org/wiki/Unicode_equivalence#Normalization

Copy-and-paste copies code points, not concepts-of-graphemes (Grapheme is a less ambiguous term, since character can mean both grapheme and code point).

If you’re converting from some other character set to Unicode, then the conversion mapping will dictate what code points you end up with and it nearly always matches how the source character set encodes composite characters – where the source character set has a single code point for a LATIN A WITH UMLAUT, then Unicode will too.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In Unicode, a character can be considered in different "compositions". For example the character

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply