I’m writing a C#/WinForms application that contains a DataGridView with 2,000 rows. My users will enter strings into these rows. The strings may be in any language but the two most likely languages are English and Arabic. I don’t have an explicit limit on the maximum number of characters in a string, per se, but what I do have is a limit of 2048 bytes to store each string when it is written to disk. If the resulting byte array is < 2048 bytes, I need to pad it with null characters. I’m assuming that UTF-8 would probably be the most efficient encoding for storing these strings? If so, then I was thinking that I would do something like this before allowing the string to be stored:
byte[] stringAsBytes = System.Text.Encoding.UTF8.GetBytes(myString);
if (stringAsBytes.Length > 2048)
{
// string is too long to be stored in 2048 bytes
}
If I understand correctly, since UTF-8 is a variable-length encoding, the maximum number of characters in a given string will be dependent on the code point range for the characters that comprise the language of the string? If that’s right, would I really need to do something like the code above for each key press to determine exactly when the string has exceeded the maximum size for storage?
No, you can use the following code:
Alternatively, you could limit string length to
which would guarantee all strings of your encoding fit into the buffer. Unfortunately, that’s only 341 chars for UTF8.