I work in C# and this is my code:
Encoding encoding;
StringBuilder output = new StringBuilder();
//somePath is string
using (StreamReader sr = new StreamReader(somePath))
{
string line;
encoding = sr.CurrentEncoding;
while ((line = sr.ReadLine()) != null)
{
//make some changes to line
output.AppendLine(line);
}
}
using (StreamWriter writer = new StreamWriter(someOtherPath, false))//encoding
{
writer.Write(output);
}
In the file which is on somePath, I have Norwegian characters like å. But, on the file in someOtherPath I get question marks instead of them. I think it’s an encoding problem, so I tried getting the input file encoding and to grant it to the output file. It had no results. I tried opening the file with Google Chrome and grant it every possible encoding but the letters weren’t the same as in the input file.
StreamReadercan only make guesses with regards to certain encodings. Ideally, you should find out what the encoding of the file really is, then use that to read the file. What created the file, and what allows you to read it correctly? Does the latter program expose which encoding it’s using? (For example, it may be using something like Windows-CP1252.)I would personally recommend using UTF-8 as your output encoding if you can, but it depends on whether you’re in control over whatever’s then reading the output.
EDIT: Okay, now I’ve seen the file, I can confirm it’s not UTF-8. The word “direktør” is represented as these bytes:
So the non-ASCII character is a single byte (F8) which is not a valid UTF-8 representation of a character.
It could be ISO-Latin-1 – it’s not clear (there are multiple encodings which would match). If it is, you can use:
(Alternatively, use
File.ReadAllLinesto make life simpler.)You’ll need to separately work out what output encoding you want.
EDIT: Here’s a short but complete program which I’ve run against the file you provided, and which has correctly converted the character to UTF-8: