Apologies, my question may be incorrect, but CLIENT_LOCALE seems like the current most-likely culprit:
I’ve created a unicode database on my Informix server. We’re pretty sure (but frankly, not absolutely certain) that its DB_LOCALE is set at en_US.57372, because that’s what’s returned for the query Select * from sysmaster:informix.sysdbslocale for that database. I’ve created a table in the unicode database, containing a char(20) column.
I have created a small dotnet program to connect to, INSERT to, and read back from my unicode table. I’ve done some testing and have specified a connection string including DB_LOCALE=en_US.57372 and have successfully inserted some unicode strings featuring accented characters outside the Latin1 set.
I’ve now moved on to testing some of the weirder-looking characters I’ve found on this random unicode character page. When I try these characters, though, I often receive a ERROR [22001] [Informix .NET provider]String data right truncation. message, even when trying to send single characters.
I don’t understand why I would be getting this error when trying to insert single characters into a char(20) column, even if they are for multi-byte unicode characters.
As far as I’m concerned, I’m sending unicode characters and the database is receiving unicode characters. I’ve been through my connection code and verified that I am constructing an IFXCommand which successfully contains those characters I’m trying to send.
The only place I don’t currently have any certainty over is my CLIENT_LOCALE, which I currently have not set, because I don’t know what it should be.
So, my question is: If the problem is likely to be my lack of a CLIENT_LOCALE setting, where can I find what my CLIENT_LOCALE should be? Alternatively, if I’m stupidly missing the real problem, can someone explain where I’m screwing up?
EDIT: I’ve just had a thought. I tried inserting a character ỉ which I thought surely should have worked, but it didn’t. I then tested the string [five spaces]ỉ[five spaces] and that worked. No idea why.
EDIT2: “ᓌ” fails. ” ᓌ ” works. “ᓌᓌ” fails. ” ᓌᓌᓌ ” works. ” ᓌᓌᓌᓌ ” fails. The only thing that stands out to me on the character’s fileformat page is that this is a 3-byte UTF-8 character, whereas it’s a 2-byte UTF-16 character. Even then, though, ” ᓌᓌᓌᓌ ” should still fit within a char(20) field, to say nothing of the smaller strings which had failed. Can’t work this out at all…
So, here’s my “solution”.
I took the list of accented unicode characters found at http://lehelk.com/2011/05/06/script-to-remove-diacritics/
I then removed from that list all of the characters which are found in , then replaced everything which does not exist in ISO-8859-1 with a
?. It’s an nasty, scorched-earth solution, but dragging our database kicking and screaming into the early 1990s doesn’t seem like it’s on the cards.