I am writing some software that takes rows from an XLS file and inserts them into a database.
In OpenOffice, a cell looks like this :
Brunner Straße, Parzelle
I am using the ExcelFormat library from CodeProject.
int type = cell->Type();
cout << "Cell contains " << type << endl;
const char* cellCharPtr = cell->GetString();
if (cellCharPtr != 0) {
value.assign(cellCharPtr);
cout << "normal string -> " << value << endl;
}
The string when fetched with the library, is returned as a char* (so cell->Type() returns STRING, not WSTRING) and now looks like this (on the console) :
normal string -> Brunner Stra�e, Parzelle
hex string -> 42 72 75 6e 6e 65 72 20 53 74 72 61 ffffffdf 65 2c 20 50 61 72 7a 65 6c 6c 65
I insert it into the database using the mysql cpp connector like so :
prep_stmt = con -> prepareStatement ("INSERT INTO "
+ tablename
+ "(crdate, jobid, imprownum, impid, impname, imppostcode, impcity, impstreet, imprest, imperror, imperrorstate)"
+ " VALUES(?,?,?,?,?,?,?,?,?,?,?)");
<...snip...>
prep_stmt->setString(8,vals["street"]);
<...snip...>
prep_stmt->execute();
Having inserted it into the database, which has a utf8_general_ci collation, it looks like this :
Brunner Stra
which is annoying.
How do I make sure that whatever locale the file is in gets transformed to utf-8 when the string is retrieved from the xls file?
This is going to be running as a backend for a web service, where clients can upload their own excel files, so “Change the encoding of the file in Libre Office” can’t work, I am afraid.
Your input seems to be encoded in latin1, so you need to set the mysql “connection charset” to
latin1.I’m not familiar with the API you are using to connect to MySQL. In other APIs you’d add
charset=latin1to the connection URL or call an API function to set the connection encoding.Alternatively you can recode the input before feeding it to MySQL.