I’ve a small problem with Text Encodings.
I’ve two Strings which I’m loading from a SQL Server 2008 database (nvarchar-field)
After loading them from the database Visual Studio 2010 displays them as follows in the watch window:
str1 = "Test"
str2 = "Test"
But the comparison with str1 = str2 returns False
If I write those strings to a file with UTF8 Encoding the result is as expected:
Test
Test
If I write those strings to a file with ANSI (Default) Encoding the result is NOT as expected:
?Test
Test
Converting the strings to bytes:
System.Text.Encoding.Default.GetBytes(str1) 'Returns ByteArray {63, 84, 101, 115, 116}
System.Text.Encoding.Default.GetBytes(str2) 'Returns ByteArray {84, 101, 115, 116}
System.Text.Encoding.UTF8.GetBytes(str1) 'Returns ByteArray {239, 187, 191, 84, 101, 115, 116}
System.Text.Encoding.UTF8.GetBytes(str2) 'Returns ByteArray {84, 101, 115, 116}
Where is the Byte 63 in case of ANSI Encoding OR Bytes 239, 187, 191 in case of UTF8 Encoding for str1 coming from?
Well, Bytes 239, 187, 191 are the BOM for UTF8. The question here would more likely be: Why do I get the BOM for str1 but not for str2?
(Well, the values are values passed to a webservice which inserts them into the database, the initial values are passed to this webservice by a client I’ve no control over)
Just so I’m clear, you do read the two strings from two different records in the database, right? Not from one record in two different ways?
Well then, someone has stored a BOM in the one record. Since BOMs are invisible when you print them, you won’t see a visual difference. Unless you convert the string to an encoding that can’t store a BOM.
That’s what happens above.
To solve this, you will need to clean up the database. Read every record, see it if starts with a BOM and if so, write the content (without the BOM) back.
Edit: I only noticed later that you said this database was created on-the-fly by the webservice. In that case, the solution is to contact the author of the webservice and tell them they’ve got a bug in their routine.