I want a user to be able to upload a word document and my program then parses the document into separate word documents. The problem is that the splitting will need to be manual as all the word documents are not formatted the same way. My initial thought is before the user uploads, the user tags the sections with a beginning and end tag (of some sort maybe a comment) that my program can then parse and split the document into separate documents. (This also needs to work for .doc and .docx so a common solution is desirable)
Ex. Input:
Doc1
Chapter 1
Blah Blah Blah
Chapter 2
Blah blah
/end Doc1
Ex. Output:
Doc1
Chapter 1
Blah Blah Blah
/end Doc1
Doc 2
Chapter 2
Blah blah
/end Doc2
Any ideas? I have been struggling with this for awhile
What you want to do is non-trivial! I have done my fair share of document manipulation, that said if you are working with a DOCX these days it is not too bad due to the supporting libraries, see:
http://openxmldeveloper.org/
Older version get more difficult, you would need to source a library for that, or as suggested use macros.
Is the “program” a web site? If so make sure you do not use COM interop!