I’ve written a script to use ImageMagick to convert PDFs into JPGs for each page, along with resizing/etc.
Where it gets slightly more tricky is some PDFs have the middle two-page spread as “one page” – so it is extra wide. Is there ANY way to “detect” this and crop the left and right sides, as two separate pages?
Assuming you want to use ImageMagick (and only ImageMagick) for this: that can’t be done. ImageMagick cannot process PDF input all by itself. It has to make use of Ghostscript anyway, so without a local Ghostscript installation it won’t work. (You will not necessarily see Ghostscript at work while you feed PDF input to ImageMagick, unless you add a
-verboseto its command line, because ImageMagick’s delegation of the job to Ghostscript happens behind your back…)Your question has two parts:
Detect page sizes
You can use ImageMagick’s
identifyto detect the page sizes of a PDF.Just run the most simple command:
identify multipage.pdf
The output will be s.th. like
The output’s page count is 0-based. So
[0]indicates the first page,[1]the second page, etc.To customize the output a bit better, you could do this:
and get
For a double-spread page the respective output should be
1190 x 792or similar.However, be warned: to use ImageMagick for querying the page sizes of PDF files is veeeery slow. Therefor, better use a different tool for this sub-task:
pdfinfo. This will be faster by several orders of magnitude:will output
If you need additional info about the pages’ ArtBox, TrimBox, BleedBox and CropBox values, just add
-boxto the commandline.As I said:
pdfinfois significantly faster in identifying page sizes for PDFs than ImageMagick is. Use the right tool for the job.Crop left and right parts of a page
Now that you have identified the large double-spread page, you could use one of the following methods (based on Ghostscript) to split down the pages in the middle:
Adapting the method described in above links will result in 2 PDF pages that still contain all their original vector and font info.
Alternatively, you can use ImageMagick. Assuming your ‘double-spread’ page is of dimension 1190×842 pt, based on A4 (595×842 pt), and assuming it is page 16 (which translates to
[15]for ImageMagick) inside an original PDF, yourconvertcommands could be s.th. like:The result gives you two raster images.