At work, I have two Word documents that represent differing versions of the same documents. I want to compare and extract the differences and I think I want to convert the word document to a text file and then diff it. Is this possible? Thanks.
Share
If you have Word installed in the system, you might use the activeX Word automation object to extract the text. Use this simple, non-tested, code to get you started
If you don’t have Word or don’t want to require your users to have Word installed, then you might as well extract the text with a little more effort. The recent .docx format that word uses, is nothing more than Open XML Office files in a zip archive. So, you just need to unzip the .docx file, search in the word folder for the xml file representing the document contents; and extract the text simply by parsing the XML (DOM or SAX or PORO or ..).