Many sequences of encoded Unicode characters have the same visual representation and the same

Question

0

Editorial Team

Asked: May 26, 20262026-05-26T09:10:06+00:00 2026-05-26T09:10:06+00:00

Many sequences of encoded Unicode characters have the same visual representation and the same

0

Many sequences of encoded Unicode characters have the same visual representation and the same computational meaning.

The ñ character can be coded two ways:

U+00F1:  ñ   (LATIN SMALL LETTER N WITH TIDLE)

or:

U+006E:  n   (LATIN SMALL LETTER N)
U+0303:  ~   (COMBINING TILDE)

This creates 10 different byte sequences that display as ñ:

U+00F1 in UTF-8, UTF-16LE, UTF-16BE, UTF-32BE, UTF32-LE 
U+006E followed by U+0303  UTF-8, UTF-16LE, UTF-16BE, UTF-32BE, UTF32-LE

Is there any straightforward way to compare Unicode strings (I’m happy with unicode characters that have been decoded from the various UTF representations) and find out that they are the same? That is, I want something that tells me that U+00F1 is the same as U+0303 U+006E

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T09:10:06+00:00

Editorial Team

2026-05-26T09:10:06+00:00Added an answer on May 26, 2026 at 9:10 am

The process is called normalization, supported by any decent Unicode library. Backgrounder is here.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Many sequences of encoded Unicode characters have the same visual representation and the same

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply