I’ve written a program that converts a SQL Server table into a SQLite table. This is a C++ program using ADO (COM) to retrieve the data from SQL Server and the C SQLite interface (wrapped by my own C++ class).
In SQL Server, I have a record with a field having the contents:
HÄAGEN-DAZS
(That first A has two dots over it). I read this field in through ADO and convert it from a BSTR into a char*, and then bind it to an SQLite INSERT statement. When I look at this field in SQLiteSpy (and other tools), I see the field appear as ‘H�AGEN DAZS’.
In the debugger, I can see that the Ä is character 0xc4, which is the correct UTF-8 representation for this character. It appears that SQLite is mangling my ‘Ä’
This is my SQLite CREATE TABLE statement:
CREATE TABLE Company ([Lookup] CHAR (30))
This is my SQLite INSERT statement:
INSERT INTO Company ([Lookup]) VALUES (?)
I convert from the BSTR provided by ADO to a char* using this function call:
WideCharToMultiByte(CP_ACP,0,In_,-1,Out_,MaxLen_,0,0);
This is my SQLite Bind statement:
sqlite3_bind_text(Statement,1,Text_,-1, (BindFunction) SQLITE_TRANSIENT);
I have confirmed in the debugger that at this point, Text_ is “HÄAGEN-DAZS” and that A is really character 0xc4.
Any ideas as to what’s going on here?
0xC4 is not utf-8 for Ä. It is isolatin-1 (also known as 8859-1) for Ä, which means it is also (sort of) the utf-16: u00C4. The utf-8 encoding is 0xC3 0x84, two bytes.