I need to scan an uploaded PDF to determine if the pages within are all portrait or if there are any landscape pages. Is there someway I can use PHP or a linux command to scan the PDF for these pages?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
(Updated answer — scroll down…)
You can use either
pdfinfo(part of either the poppler-utils or the xpdf-tools) oridentify(part of the ImageMagick toolkit).identify:
Example output:
Or a bit simpler:
gives:
Note, how
identifyuses a zero-based page counting mechanism!Pages are ‘landscape’ if their width is bigger than their height. They are neither-nor, if both are equal.
The advantage is that
identifylets you tweak the output format quite easily and very extensively.pdfinfo:
Example output:
pdfinfois definitely faster and also more precise thanidentify, if it comes to multi-page PDFs. (The 13-page PDF I tested this with tookidentifyto 31 seconds to process, whereaspdfinfoneeded less than half a second….)Be warned: by default
pdfinfodoes report the size for the first page only. To get sizes for all pages (as you may know, there are PDFs which use mixed page sizes as well as mixed orientations), you have to modify the command:Output now:
This will print the sizes of page 3 (f irst to report) through page 13 (l ast to report).
Scripting it:
(the
bc-trick is required because the-gtcomparison works for the shell only with integers. Dividing by1withbcwill take round the possible real values to integers…)Result:
Update: Using the ‘right’
pdfinfoto discover page rotations…My initial answer tooted the horn of
pdfinfo. Serenade X says in a comment that his/her problem is to discover rotated pages.Ok now, here is some additional info which is not yet known widely and therefor has not yet been really absorbed by all
pdfinfousers…As I mentioned, there are two different
pdfinfoutilities around:xpdf-utilspackage (on some platform also namedxpdf-tools).poppler-utilspackage (on some platforms also namedpoppler-tools, and sometimes it is not separated out as a packages but is part of the mainpopplerpackage).Poppler’s
pdfinfooutputSo here is a sample output from Poppler’s
pdfinfocommand. The tested file is a 2-page PDF where the first page is in portrait A4 and the second page is in landscape A4 format:Do you see the lines saying
Page 1 rot: 0andPage 2 rot: 0?Do you notice the lines saying
Page 1 size: 595 x 842 pts (A4)andPage 2 size: 842 x 595 pts (A4)and the differences between the two?XPDF’s
pdfinfooutputNow let’s compare this to the output of XPDF’s
pdfinfo:You may notice one more difference, if you look closely enough. I won’t point my finger to it, and will keep my mouth shut for now… 🙂
Poppler’s
pdfinfocorrectly reports rotation of page 2Next, I rotate the second page of the file by 90 degrees using
pdftk(I don’t have Adobe Acrobat around):Now Poppler’s
pdfinforeports this:As you can see, the line
Page 2 rot: 90tells us what we are looking for. XPDF’spdfinfowould essentially report the same info about the changed file as it does about the original one. Of course, it would still correctly capture the changedCreator:,Producer:and*Date:infos, but it would miss the rotated page…Also note this detail: page 2 originally was designed as a landscape page, which can be seen from the
Page 2 size: 842 x 595 pts (A4)info part. However, it shows up in the current PDF as a portrait page, as can be seen by thePage 2 rot: 90part.Also note that there are 4 different values that could appear for the rotation info:
0(no rotation),90(rotation to the East, or 90 degrees clockwise),180(rotation to the South, tumbled page image, upside-down, or 180 degrees clockwise),270(rotation to the West, or 90 degrees counter-clockwise, or 270 degrees clockwise).Some Background Info
Popper (developed by The Poppler Developers) is a fork of XPDF (developed by Glyph & Cog LLC), that happened around 2005. (As one of their important reason for their forking the Poppler developer at the time gave: Glyph & Cog didn’t always provide timely bugfixes for security-related problems…)
Anyway, the Poppler fork for a very long time kept the associated commandline utilities, their commandline parameters and syntax as well as the format of their output compatible to the original (XPDF/Glyph & Cog LLC) ones.
Existing Poppler tools gaining additional features over competing XPDF tools
However, more recently they started to add additional features. Out of the top of my head:
pdfinfonow also reports the rotation status of each page (starting with Poppler v0.19.0, released March 1st, 2012).pdffontsnow also reports the font encoding for each font (starting with Poppler v0.19.1, released March 15th, 2012).Poppler tools getting more siblings
The Poppler tools also provide some extra commandline utilities which are not in the original XPDF package (some of which have been added only quite recently):
pdftocairo– utility for creating PNG, JPEG, PostScript, EPS, PDF, SVG (using Cairo)pdfseparate– utility to extract PDF pagespdfunite– utility to merge PDF filespdfdetach– utility to list or extract embedded files from a PDFpdftohtml– utility to convert HTML from PDF files