Im reading a file with ReadAllText
String[] values = File.ReadAllText(@"c:\\c\\file.txt").Split(';');
int i = 0;
foreach (String s in values)
{
System.Console.WriteLine("output: {0} {1} ", i, s);
i++;
}
If I try to read some files I get sometimes the the wrong character back (for ÖÜÄÀ…). The output is like ‘?’, its because there is some trouble with the encoding:
output: 0 TEST
output: 1 A??O?
One solution would be to set the encoding in ReadAllText, lets say something like ReadAllText(@"c:\\c\\file.txt", Encoding.UTF8) that could fix the problem. But what if I would still get ‘?’ as output? What if I dont know the encoding of the file? And what if every single file got a different encoding? What would be the best way to do it with c#? Thank you
The only way to reliably do this is to look for byte order marks at the start of the text file. (This blob more generally represents the endianness of character encoding used, but also the encoding – e.g. UTF8, UTF16, UTF32). Unfortunately, this method only works for Unicode-based encodings, and nothing before that (for which much less reliable methods must be used).
The
StreamReadertype supports detecting these marks to determine the encoding – you simply need to pass a flag to the parameter as such:You can then check the value of
stremReader.CurrentEncodingto determine the encoding used by the file. Note however that if no byte encoding marks exist, thenCurrentEncodingwill default toEncoding.Default.Refer codeproject solution to detect encoding