Is there a programmatic way to extract equations (and possibly images) from an MS Word document? I’ve googled all over, but have yet to find anything that I can sink my teeth into and work from. If possible, I’d like to be able to do this with VB.NET or C#, but I can pick up enough of any language to hack out a DLL. Thanks!
EDIT: Right now I’m looking at extracting the equations from Word 2003, but if converting it to 2007/Open XML is required, that’s fine.
I don’t know if any of this will help, but the object model in Word 2000/2003 has an
InlineShapescollection as part of theDocumentobject which represents embedded images and possibly similar objects like equations.Some VBA code to copy the first item onto the clipboard, which might help you extract them:
It’s accessible in .NET too, MSDN link.