Can someone please confirm that all Kanji characters in Chinese are 3 bytes long in UTF-8?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The commonly used Hanzi/Kanji characters are in the “CJK Unified Ideographs” block between U+4E00 and U+9FFF, and take 3 bytes in UTF-8. (The Japanese Hiragana and Katakana characters also take 3 bytes.)
However, there are also some very rarely-used characters in the “CJK Unified Ideographs Extension B” and “CJK Compatibility Ideographs Supplement” blocks, which take 4 bytes in UTF-8.
Also be aware that Chinese text often contains ASCII characters like the digits 0-9.