I am trying to use “Apache POI” to extract embedded equation and text from a .doc MS Word file into a .ppt MS Powerpoint file, I have successfully extracted text, but how do I extract embedded equations?
the Embedded Equations comes out like this if I only extract it as text:
!!EMBED Equation.3
This may not help you with the binary .doc format, but for the newer .docx format, I was able to get to the equation, which is embedded as an OLE document, using the following code:
And then you can extract the MathType data in there and hand it to a MTEF parser.
If you don’t need the MathType data, there is also a placeholder image (in WMF format) that just renders the equation.