First, let’s see the code:
//The encoding of utf8.txt is UTF-8
StreamReader reader = new StreamReader(@"C:\\utf8.txt", Encoding.UTF8, true);
while (reader.Peek() > 0)
{
//What is the encoding of lineFromTxtFile?
string lineFromTxtFile = reader.ReadLine();
}
As Joel said in his famous article:
If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.”
So here comes my question: what is the encoding of the string lineFromTxtFile? UTF-8(because it is from a text file encoded in UTF-8)? or UTF-16(because string in .NET is “Unicode”(UTF-16))?
Thanks.
.NET strings are Unicode. Encoding doesn’t play a part, then until you need to use it next. If you go to write it out to a file, for example, then you will specify the output encoding. But since .NET handles everything you do with the string via library calls, it doesn’t matter how it’s represented in memory.