Trying to read unicode characters from a word document but getting symbols (????).
Here my code :
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
object miss = System.Reflection.Missing.Value;
object enc = Microsoft.Office.Core.MsoEncoding.msoEncodingEUCJapanese;
object path = @"C:\Users\file.doc"
object readOnly = true;
Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss,
ref miss, ref miss, ref miss, ref miss, ref miss, ref enc, ref miss, ref miss, ref miss, ref miss, ref miss);
string totaltext = "";
for (int i = 0; i < docs.Paragraphs.Count; i++)
{
totaltext += " \r\n " + docs.Paragraphs[i + 1].Range.Text.ToString();
Console.WriteLine(totaltext);
}
// Console.WriteLine(totaltext);
docs.Close();
word.Quit();
Given the comments, it sounds like the problem may well just be with
Console.WriteLine.Try writing to a file instead:
Then open the file in Notepad, specifying UTF-8 as the encoding, and I suspect you’ll see everything correctly.