My problem is I got following encoding error while copying data from csv files into a database table.
psycopg2.DataError: invalid byte sequence for encoding “UTF8”: 0xf8
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by “client_encoding”.
I am not using any encoding and decoding command. And in order to copy data from file to a table I am using following code.
cur.copy_from(myFile, myTable)
And these files contains lot of special characters and wierd data. But I want to store all these data.
EDIT
The table is :
create table myTable(id integer, name character varying(10000));
and the sample of csv file is:
"1";"This is |_|¨^~~ || ¨text wuth special charater like Bjш;; ø"
"2";"Test data -._.- (2010/10/11) "
You write that you are not specifying any encoding, and it seems like psycopg2 defaults to UTF-8 then.
0xf8isn’t a valid single-byte UTF-8 code point.Is your source file possibly in ISO-8859-1 where
0xf8corresponds toø?Edit:
There are several places where this problem could be addressed, and which of them is correct depends on your situation.
If you repeatedly will have to import ISO-8859-1-files you might want to work with encoding to make your script consistent.
If you only need to do this import once, why not simply convert the files to the expected format outside of Python, with for example iconv or recode?