I’m pulling some internationalized text from a MS SQL Server 2005 database. As per the defaults for that DB, the characters are stored as UCS-2. However, I need to output the data in UTF-8 format, as I’m sending it out over the web. Currently, I have the following code to convert:
SqlString dbString = resultReader.GetSqlString(0);
byte[] dbBytes = dbString.GetUnicodeBytes();
byte[] utf8Bytes = System.Text.Encoding.Convert(System.Text.Encoding.Unicode,
System.Text.Encoding.UTF8, dbBytes);
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
string outputString = encoder.GetString(utf8Bytes);
However, when I examine the output in the browser, it appears to be garbage, no matter what I set the encoding to.
What am I missing?
EDIT:
In response to the answers below, the reason I thought I had to perform a conversion is because I can output literal multibyte strings just fine. For example:
OutputControl.Text = "カルフォルニア工科大学とチューリッヒ工科大学は共同で、太陽光を保管可能な燃料に直接変えることのできる装置の開発に成功したとのこと";
works. Here, OutputControl is an ASP.Net Literal. However,
OutputControl.Text = outputString; //Output from above snippet
results in mangled output as described above. My hypothesis was that the database’s output was somehow getting mangled by ASP.Net. If that’s not the case, then what are some other possibilities?
EDIT 2:
Okay, I’m stupid. It turns out that there’s nothing wrong with the database at all. When I tried inserting my own literal double byte characters (材料,原料;木料), I could read and output them just fine even without any conversion process at all. It seems to me that whatever is inserting the data into the DB is mangling the characters somehow, so I’m going to look at that. With my verified, “clean” data, the following code works:
OutputControl.Text = dbString.ToString();
as the responses below indicate it should.
Your code does essentially the same as:
stringitself is a UNICODE string (specifically, UTF-16, which is ‘almost’ the same as UCS-2, except for codepoints not fitting into the lowest 16 bits). In other words, the conversions you are performing are redundant.Your web app most likely mangles the encoding somewhere else as well, or sets a wrong encoding for the HTML output. However, that can’t be diagnosed from the information you provided so far.