I saw this question PHP – Get number of pages in a Word document . I also need to determine the pages count from given word file (doc/docx). I tried to investigate phplivedocx/ZF (@hobodave linked to those in the original post answers), but I lost my hands and legs there. I can’t use any outer web service either (like DOC2PDF sites, and then count the pages in the PDF version, or so…).
Simply: Is there any php code (using ZF or anything else in PHP, excluding COM object or other execution-files, such ‘AbiWord’; I’m using shared Linux server, without exec or similar function), to find the pages count of word file?
EDIT: The word versions that about to be supported are Microsoft-Word 2003 & 2007.
Getting the number of pages for docx files is very easy:
For 97-2003 format it’s certainly challenging, but by no means impossible. The number of pages is stored in the SummaryInformation section of the document, but due to the OLE format of the files that makes it a pain to find. The structure is defined extremely thoroughly (though badly imo) here and simpler here. I looked at this for an hour today, but didn’t get very far! (not a level of abstraction I’m used to), but output the hex to better understand the structure:
which will out put code where you can find the sections such as:
Which will allow you to see the referencing info such as:
Which will allow you to determine properties described:
Which will let you find the relevant section of code, unpack it and get the page number. Of course this is the hard bit that I just don’t have time for, but should set you in the right direction.
M$ don’t make it easy!