In one part of my app users can copy a Word or rtf document and paste it into a textbox on a form, and on submitting the form any images and a lot of formatting are stripped out of the form field content.
I want to achieve the same result but by reading from the file direct rather than by a manual form submit i.e. strip out the hidden characters and image data and just leave text and linefeeds / carriage returns.
How can I achieve a similar thing?
If you just want to extract the text from Word documents, you could try POI. CF9 already includes a version that can handle most .doc or .docx files. (It does not handle .rtf files). For CF8, you will need to use the javaLoader to load a newer version. Reading Office documents with ColdFusion (2).