I have a little problem, I am building a database from CSV files using a Java application connected to the mySQL database.
CSV is ISO-8859-1 encoded.
It is read via a buffered file reader and parsed with String methods.
Then the strings are introduced to mySQL via JDBC driver.
Problem is: accents (this is a french application) are lost in the transfer. In the mySQL database, they are in an unidentified format which is not UTF-8 neither Latin-1…
My hypothesis is that the Strings are encoded weirdly and keep this encoding when reinserted. How can-I enforce the charset for an INSERT statement in Java?
You need to ensure that you read the CSV using
InputStreamReaderwith the proper charset (which is the one of the file itself, which is in this particular case thusISO-8859-1).You also need to ensure that the JDBC connection string contains a
characterEncodingparameter with the proper charset (which is the one the table is been created with, which you have yet to figure out on the MySQL database). If it appears to be an Unicode charset, then you need to add the parameteruseUnicode=trueas well.Your next question shall probably be How do I determine which charset my DB table is using?. You can do this using the
SHOWcommand. It’ll contain information about the charset.That said, unrelated to the problem, are you aware that MySQL offers builtin CSV import facilities and that you thus don’t necessarily need Java/JDBC for this? Checkout the
LOAD DATA INFILEcommand. You can specify the CSV file’s charset as command argument and MySQL will worry about the correct conversion itself.