I have a lot of .doc files with entry specifications for a database. I

Question

0

Asked: June 12, 20262026-06-12T23:58:35+00:00 2026-06-12T23:58:35+00:00

I have a lot of .doc files with entry specifications for a database. I

0

I have a lot of .doc files with entry specifications for a database. I need to parse through all of these documents and create entries with the information within the documents. I have been trying to use the COM approach. The file has plain text on the top and at the bottom of the page… however, the specifications are in a table at the center of the page. If I don’t unlink the new .txt file I can see that the content is transfered to the new document, but it has a bunch of invalid characters in the form of [] running throughout it. When I use file_get_contents() it completely ignores all of the text from the table.

Is there someway to programmatically take care of this? I can’t really find any information on the API of the word.application COM object. Ideally I’m thinking I should strip the formatting then save the file as a .txt file or something to that effect.

Any help would be greatly appreciated.

Here is my code:

    $dir   = $PATH."/scripts/specsheets/doc";
    $files = scandir($dir);
    foreach( $files as $file ) {
        if( strtolower(substr($file, -3)) == "doc" ) {

            $word = new COM("word.application") or die("Unable to instantiate Word");
            $word->Documents->Open($dir."/".$file);
            $new_file = substr($dir."/txt/".$file, 0, -4).".txt";

            $word->Documents[1]->SaveAs($new_file, 2);
            $word->Documents[1]->Close(false);
            $word->Quit();
            $word = NULL;
            unset($word);

            $output = file_get_contents($new_file);
            rename($dir."/".$file, $dir."/archive/".$file);

            echo utf8_encode($output);
        }
    }

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T23:58:36+00:00

Editorial Team

2026-06-12T23:58:36+00:00Added an answer on June 12, 2026 at 11:58 pm

Can’t find a solution using the COM approach… but you can use the antiword program for Windows to get the output if you use this command in php

$content = shell_exec("C:/antiword/antiword.exe ".$filename);

the link for the windows version is:

http://www-stud.rbi.informatik.uni-frankfurt.de/~markus/antiword/

It works very well, it even extracts the data in the tables. Definitely solved my issue.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a lot of .doc files with entry specifications for a database. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply