I’m working with a CSV that contains characters like:
” and •
I am reading the CSV via OleDb and the provider is Microsoft.Jet.OLEDB.4.0. when the data is loaded into the OleDbCommand, the characters are converted to the following respectively:
“ and •
I suspected there might be a collation setting in the connection string but I was unable to find anything about this.
I can confirm the following:
- I can see the original character in the CSV when I open it.
- If I run a select on the file through OleDb WHERE [field] LIKE ‘%•%’ I get 0 rows but if SELECT WHERE [field] LIKE ‘%“%’ I get rows returned.
Any thoughts?
Finally! Thanks to @HABJAN I was able to get to the resolution which is as simple as setting the CharacterSet in the Extended Properties of the connection string. For my situation it was UTF-8… commonly used by default in PHPMyAdmin which is where my data was retrieved from.
Resulting working connection string:
Key is CharacterSet=65001 (Code Page Identifiers) which might have been obvious to some collation-savvy individuals but I’ve somehow managed to avoid these issues over the years and never come across it in this respect.
I was also able to get HABJAN’s solution to work when also following the documentation found at https://learn.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver and setting the CharacterSet to the same as above.
For my situation, this is the better method as it is a simpler/more maintainable solution, but +1 to HABJAN for helping me get there!
Thanks