Can have an unicode ligature character fi (Unicode U+FB01) more than one representation in

Question

0

Asked: May 30, 20262026-05-30T05:43:21+00:00 2026-05-30T05:43:21+00:00

Can have an unicode ligature character fi (Unicode U+FB01) more than one representation in

0

Can have an unicode ligature character fi (Unicode U+FB01) more than one representation in UTF8? Which one? For each normalization form?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T05:43:23+00:00

This depends on the meaning of “character,” which is rather obscure. In Unicode, “character” usually means a codepoint assigned to a character, and this does match exactly the intuitive concept of “character.”

A single codepoint, such as U+FB01, has only one representation in UTF-8, because UTF-8 defines an unambiguous algorithm for generating the encoded form.

An intuitive character, such as the fi ligature, may have different representations as a codepoint or as a sequence of codepoints, which each have UTF-8 representations. Unicode normalization rules define, in part, mappings between such alternatives.

But the compatibility mapping for U+FB01 (to U+0066 U+0069, i.e. “f” followed by “i”) does not preserve the identity of an intuitive character: the ligature is mapped to two normal letters.

On the other hand, you can ask for, or suggest, ligature behavior by inserting U+200D ZERO WIDTH JOINER (ZWJ) between two letters, like “f” and “i”. In a sense, the sequence U+0066 U+200D U+0069 is an alternative representation of the fi ligature, but this is not a formal property of character, and it depends on rendering software whether it pays attention to ZWJ.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Can have an unicode ligature character fi (Unicode U+FB01) more than one representation in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply