I am using C# and Microsoft Word 12.0 object library to read data from

Question

0

Asked: May 23, 20262026-05-23T22:46:28+00:00 2026-05-23T22:46:28+00:00

I am using C# and Microsoft Word 12.0 object library to read data from

0

I am using C# and Microsoft Word 12.0 object library to read data from .doc file and then save these content to a text file (This is required by my Project). My .doc file have some tables and I need to read each row and column in such tables.
The reading operations were executed successfully, but the data contains some strange characters (like square ones) as in the attached image

enter image description here

Here is the code I used:

private void btnRead_Click(object sender, EventArgs e)
{
    try
    {
        Microsoft.Office.Interop.Word.ApplicationClass wordObject = new ApplicationClass();
        object file = textBox1.Text; //this is the path
        object nullobject = System.Reflection.Missing.Value;
        Microsoft.Office.Interop.Word.Document docs = wordObject.Documents.Open
            (ref file, ref nullobject, ref nullobject, ref nullobject,
            ref nullobject, ref nullobject, ref nullobject, ref nullobject,
            ref nullobject, ref nullobject, ref nullobject, ref nullobject,
            ref nullobject, ref nullobject, ref nullobject, ref nullobject);

        docs.ActiveWindow.Selection.WholeStory();
        docs.ActiveWindow.Selection.Copy();
        IDataObject data = Clipboard.GetDataObject();
        String allData = "";
        for (int t = 1; t < docs.Tables.Count; t++ )
        {
            Table tbl = docs.Tables[t];
            for (int r = 1; r < tbl.Rows.Count; r++)
            {
                for (int c = 1; c < 3; c++)
                {
                    allData += tbl.Cell(r, c).Range.FormattedText.Text.Trim() + Environment.NewLine;
                }
            }
        }
        txtData.Text = allData;
        saveTextFile(allData);

        docs.Close(ref nullobject, ref nullobject, ref nullobject);
    }
    catch (Exception j)
    {
        MessageBox.Show(j.Message);
    }
}

private void saveTextFile(String data)
{ 
    try
    {
        StreamWriter sw = new StreamWriter(txtOutput.Text.Trim());
        sw.WriteLine(data);
        sw.Flush();
        sw.Close();
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.StackTrace);
    }
}

Does anyone have any ideas how can I remove such strange characters, please?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T22:46:29+00:00

Well, I’m not very familiar with the doc format specifically, but those boxes (the “strange characters”) are generally displayed when there is a character present that is outside of the printable character set. In this case, since there are always two of them at the end of a line, it might be related to newline characters in the document (or some newline-related parsing error), like \r\n. \r\n is commonly present in many Windows-formatted documents, though whether this is the case in .doc documents is beyond my expertise.

Of course, removing them should be relatively trivial if you’re happy to hack it. You could simply add a check that just deletes the last two characters of every line. It’s not pretty (and I’d probably recommend against it just on principle) but it appears that it would work.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using C# and Microsoft Word 12.0 object library to read data from

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply