I am trying to write something in Dutch to a CSV a file and this is what happens
In the following program, ideally, “Eéntalige affiche in Halle !!” should be written in the csv file. However, it’s writing “Eéntalige affiche in Halle !!”
# -*- encoding: utf-8 -*-
import csv
S="Eéntalige affiche in Halle !!".encode("utf-8")
file=c = csv.writer(open("Test.csv","wb"))
file.writerow([S])
In the CSV file== ? “Eéntalige affiche in Halle !!”
You are writing data correctly. The problem lies in whatever is reading the data; it is interpreting the UTF-8 data as Latin 1 instead:
The U+00E9 codepoint (é, LATIN SMALL LETTER E WITH ACUTE) is encoded to two bytes in UTF-8, C3 and A9 in hex. If you treat those two bytes as Latin1 instead, where each character is always only one byte, you get
Ãand©instead.There is no standard for how to treat CSV files and encoding, you’ll need to adjust your encoding to the intended target application to read this information. Microsoft Excel reads CSV files according to the current codepage, for example.
If your CSV reader is expecting Latin 1, by all means, encode to Latin 1 instead.