Can UTF-8 string contain zerobytes? I’m going to send it over ascii plaintext protocol, should I encode it with something like base64?
Can UTF-8 string contain zerobytes? I’m going to send it over ascii plaintext protocol,
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Yes, the zero byte in UTF8 is code point 0, NUL. There is no other Unicode code point that will be encoded in UTF8 with a zero byte anywhere within it.
The possible code points and their UTF8 encoding are:
You can see that all the non-zero ASCII characters are represented as themselves while all mutibyte sequences have a high bit of 1 in all their bytes.
You may need to be careful that your ascii plaintext protocol doesn’t treat non-ASCII characters badly (since that will be all non-ASCII code points).