I need to convert a CSV file from iso to UTF-8 to keep the accents in the database.
French accents (é,è,ê, and the like) are not kept when I try to translate them to UTF-8, they are changed to ‘?’.
I’m stumped.
I use the following function for the translation:
public static string iso8859ToUnicode(string src) { Encoding iso = Encoding.GetEncoding('iso8859-1'); Encoding unicode = Encoding.UTF8; byte[] isoBytes = iso.GetBytes(src); byte[] unibytes = Encoding.Convert(iso,unicode,isoBytes); char[] unichars = new char[iso.GetCharCount(unibytes,0,unibytes.Length)]; unicode.GetChars(unibytes,0,unibytes.Length,unichars,0); return new string(unichars); }
But it doesn’t seem to work well. Help?
I strongly suspect that your original string doesn’t have the correct values. My guess is that you’ve read it from the file as if it were UTF-8.
To convert between two encodings, you shouldn’t have the string in the first place – you should basically load the bytes of the file and call
Encoding.Convert()that way. Alternatively, load the file using ISO-Latin-1 and just save it as UTF-8. For example:or