We have a “Download to Word” feature in our application. Rather than creating an actual binary .doc file we create an HTML document and set the MIME type to indicate it’s a Word document. Here’s a stripped-down version of the method we’re using.
private FileContentResult ExportToWord( string htmlSource, string filename )
{
StringBuilder doc = new StringBuilder();
doc.Append( "<html><body>" );
doc.Append( htmlSource );
doc.Append( "</body></html>" );
byte[] buffer = Encoding.UTF8.GetBytes( doc.ToString() );
FileContentResult result = new FileContentResult( buffer, "application/msword" );
result.FileDownloadName = string.Format( "{0}.doc", filename );
return result;
}
In the above example htmlSource is the body of the document, so it would contain something like:
<p>This is the first paragraph.</p>
All of the above works just fine until we introduce Unicode characters into htmlSource. If htmlSource contains
<p>这是一个测试</p>
then in the Word document we get
这是一个测试
We’ve tried replacing Encoding.UTF8 with Encoding.Unicode and Encoding.UTF32 but in both cases Word ends up displaying all the markup with null/space between each character (and the Chinese strings still don’t show up correctly).
I’ve also tried using Server.HtmlEncode against the Chinese string, but that gives me back the same string of Chinese characters.
I’m at a loss as to how to solve this problem.
As it turns out, while finding the solution wasn’t easy the actual implementation was pretty simple. We just changed this line:
To this:
The GetPreamble() method adds the byte-order-mark to the file so Word knows how to interpret the file contents. It is now able to determine that the file contains Unicode and properly interprets the markup instead of displaying it in the document.