When reading data from the Input file I noticed that the ¥ symbom was not being read by the StreamReader. Mozilla Firefox showed the input file type as Western (ISO-8859-1).
After playing around with the encoding parameters I found it worked successfully for the following values:
System.Text.Encoding.GetEncoding(1252) // (western iso 88591) System.Text.Encoding.Default System.Text.Encoding.UTF7
Now I am planning on using the ‘Default’ setting, however I am not very sure if this is the right decision. The existing code did not use any encoding and I am worried I might break something.
I know very little (OR rather nothing) about encoding. How do I go about this? Is my decision to use System.Text.Encoding.Default safe? Should I be asking the user to save the files in a particular format ?
Code page 1252 isn’t quite the same as ISO-Latin-1. If you want ISO-Latin-1, use
Encoding.GetEncoding(28591). However, I’d expect them to be the same for this code point (U+00A5). UTF-7 is completely different (and almost never what you want to use).Encoding.Defaultis not safe – it’s a really bad idea in most situations. It’s specific to the particular computer you’re running on. If you transfer a file from one computer to another, who knows what encoding the original computer was using?If you know that your file is in ISO-8859-1, then explicitly use that. What’s producing these files? If they’re just being saved by the user, what program are they being saved in? If UTF-8 is an option, that’s a good one – partly because it can cope with the whole of Unicode.
I have an article on Unicode and another on debugging Unicode issues which you may find useful.