What is the maximum number of bytes for a single UTF-8 encoded character?
I’ll be encrypting the bytes of a String encoded in UTF-8 and therefore need to be able to work out the maximum number of bytes for a UTF-8 encoded String.
Could someone confirm the maximum number of bytes for a single UTF-8 encoded character please
The maximum number of bytes per character is 4 according to RFC3629 which limited the character table to
U+10FFFF:(The original specification allowed for up to six byte character codes for code points past
U+10FFFF.)Characters with a code less than 128 will require 1 byte only, and the next 1920 character codes require 2 bytes only. Unless you are working with an esoteric language, multiplying the character count by 4 will be a significant overestimation.