What’s a good estimate/conversion/formula to figure out X# characters = Y# bytes?
What’s a good estimate/conversion/formula to figure out X# characters = Y# bytes?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
It entirely depends on the encoding and potentially the data.
For UTF-16, if you know that all the characters are in the Basic Multilingual Plane, the answer will be bytes = 2 * characters.
For UTF-8, if everything is in the ASCII range, then bytes = characters – but if there are lots of Far Eastern characters, it could be as much as bytes = 3 * characters (and that’s still assuming the Basic Multilingual Plane).
Other encodings obviously have different scenarios. Could you give more details about your situation (and your platform)? Do you want an accurate calculated value based on actual characters? Do you know anything about the text you’re going to encode?