I have been asked to automate a particular task at work today which takes up a lot of our time! The following is what needs to be done and I would appreciate any help on how I can do this (Implementation Advice) within the realms of my knowledge, if possible.
Problem
I have a PowerPoint document (.ppt). I would like to extract text from there (the text is in bullet point format). I would like to insert these bullets points into an Excel sheet, each bullet point should be a row. I would also like to put in the adjacent column the page this bullet point text was taken from.
So, basically: Extract from ppt –> Insert into Excel sheet each row being a bullet point.
Technologies available to me
Perl, PHP and Java.
I would prefer PHP to be honest as this is my primary language, but I am happy to consider anything else you guys/gals think is best. Second would be Perl and then Java. I do not want to be compiling classes and installing the JDK just for this! 🙂
Key Questions
- How do you reference a bullet point?
- Am I likely to end up with just a load of unstructured text in the Excel sheet?
- Are there any barriers to reading from a ppt file?
Update
I would consider MS technologies (VB, etc.) if it makes life easier but I have never used it and I despise MS technology! Hope I don’t get flamed by the evangelists! 🙂
Here is a sample script using Win32::OLE.
By the way, once you have converted the slides into a format you can process, you can use Spreadsheet::WriteExcel on non-MS systems to write the output. Therefore, I would recommend two programs: One to transform the PowerPoint documents and another to generate the Excel files.
Note that an excellent source of information for Microsoft Office applications is the Object Browser. You can access it via Tools → Macro → Visual Basic Editor. Once you are in the editor, hit F2 to browse the interfaces, methods, and properties provided by Microsoft Office applications.