I have a chunk of code that I’m using to read MS Office Word documents.
static void ReadMSOfficeWordFile(string file) {
try {
Microsoft.Office.Interop.Word.Application msWordApp = new Microsoft.Office.Interop.Word.Application();
object nullobj = System.Reflection.Missing.Value;
object ofalse = false;
object ofile = file;
Microsoft.Office.Interop.Word.Document doc = msWordApp.Documents.Open(
ref ofile, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj);
string result = doc.Content.Text.Trim();
doc.Close(ref ofalse, ref nullobj, ref nullobj);
msWordApp.Quit();
CheckLineMatch(file, result);
}
catch {
RaiseError("Unable to parse file because of MS Office error.", file);
}
}
I have three issues with this.
First- It relies on MS Office being installed on each system this might run on. Some people prefer Libre Office, but this still needs to run against MS Office Word documents.
Second- I don’t know if this will even work for MS Office 2003 AND MS Office 2007 documents…
Third- It’s SLOW. It’s excruciatingly slow.
SO! I assume there MUST be a better way to run it than this. I’m guessing that someone has to know of a better way than what a novice is coming with. I’m only trying to read the text in the document, nothing else.
In response to your “Word application hanging open”, you need to tell it to close.
See http://msdn.microsoft.com/en-us/library/bb215475(v=office.12).aspx
Regarding the “relies on MS Offise being installed”, you are using the interop. So by definition is requires it to be installed. You can look into one of the commercial libraries.
http://www.aspose.com/categories/.net-components/aspose.words-for-.net/default.aspx
http://www.gemboxsoftware.com/document/pricelist