I receive word documents with specified formating corresponding to the data that is in them. For example, all headers have the exact same formating (Times New Roman-Font 14-Bold).
What is the best way to process such MS Word documents (.doc or .docx) into xml documents? Language is not an issue (I’ll use Lisp/Boost.Spirit if I have to!).
Used a very inefficient conditional search in VBA to literally copy the document into a second document. The second document was then saved with a .xml extension. Got the job done, but its ugly.