I have a PDF document that also contains images.
Now I want to know the resolution of these images.
A first step would be to somehow get the images out of the PDF document. But how?
Is that even possible with something provided in Cocoa?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Have a look at this answer for your other question:
Basically, you can now use the (new)
-listparameter for Poppler’spdfimagescommandline utility (it will NOT work for XPDF’s version ofpdfimages!).It will report the dimensions of each image appearing on the queried pages.
(You can also use it to extract images from a PDF:
pdfimages -png -f 3 -l 5 some.pdf prefix---will extract all images as PNGs from the PDF file, starting with first page 3 and ending with last page 5, using a filename prefix ofprefix---for each image. But this problem seems to not be the main focus of your question…)Example:
pdfimages -list -f 1 -l 3 /Users/kurtpfeifle/Downloads/ct-magazin-14-2012.pdf page num type width height color comp bpc enc interp object ID --------------------------------------------------------------------- 1 0 image 1247 1738 rgb 3 8 jpx no 3053 0 2 1 image 582 839 gray 1 8 jpeg no 2080 0 2 2 image 344 364 gray 1 8 jpx no 2079 0 3 3 image 581 838 rgb 3 8 jpeg no 7 0 3 4 image 1088 776 rgb 3 8 jpx no 8 0 3 5 image 6 6 rgb 3 8 image no 9 0 3 6 image 8 6 rgb 3 8 image no 10 0 3 7 image 4 6 rgb 3 8 image no 11 0 3 8 image 212 106 rgb 3 8 jpx no 12 0 3 9 image 150 68 rgb 3 8 jpx no 13 0 3 10 image 6 6 rgb 3 8 image no 14 0 3 11 image 4 4 rgb 3 8 image no 15 0It does not directly report the DPI resolution — but from the ‘width’ and ‘height’ dimensions you can calculate it easily: you measure the width of the picture on your screen with an inch ruler and then divide the ‘width pixels’ by the measured ruler number…
You find this strange, because the result is dependent on your current zoom level? Yes, it is!
The concept of the ‘resolution’ is always dependent on the environment. A so-called ‘hi-res’ picture basically always has lots of pixels in width and height. This allows for better quality (or ‘resolution’) if the picture needs to be displayed or printed with higher zoom levels.
Update
Meanwhile there is a new version of (Poppler’s)
pdfimages:This reports the resolution of embedded images as well, in PPI (pixels per inch), in horizontal (
x-ppi) and vertical (y-ppi) directions:This new feature appeared first in Poppler version 0.25 (released Wed December 11, 2013). It additionally reports…
…of embedded images.
Limitations of
pdfimages -listPerhaps I should also make you aware of the limitations of the
pdfimagesutility, and give an example where its output report is not completely correct.One example is this handcoded PDF from my (recently created) GitHub repository of PDFs to help beginners to study the syntax of PDF source code.
I originally created this PDF in order to demonstrate a bug with Mozilla’s PDF.js renderer.
Here is a screenshot about how it looks in PDF.js (left) and how it should look when rendered correctly (right, rendered by Ghostscript and Adobe Reader):
(Right-click on each of above images. Select “Open image in new tab” to see the exact differences…”)
The PDF file contains a 2×2 pixels image, embedded only once (with object ID
5 0), but displayed on the page multiple times with different settings, where each time the image is placed…Under these extreme circumstances
pdfimages -listfalls flat on its nose when trying to determine some of the resolutions for instances of this image:pdfimages -listgets most values correct, if there is no rotation and/or no skewing involved. It is no wonder that there are discrepancies if the image is rotated or skewed: Because how would you even reliably define anx-ppiandy-ppivalue for such cases? That explains the (completely wrong) values of72000 y-ppifor image no. 5 and14401 x-ppifor image no. 8.As you can easily see,
pdfimagesis rather clever for determining other image properties:5 0for all instances of the displayed image, indicating that this image is embedded once, but displayed multiple times on the page.2x2pixels.