I have a multi-page PDF file that has information I need to parse. The

Question

0

Asked: May 26, 20262026-05-26T13:41:11+00:00 2026-05-26T13:41:11+00:00

I have a multi-page PDF file that has information I need to parse. The

0

I have a multi-page PDF file that has information I need to parse. The information and picture is confined to its own page. I need to extract the text and image from the PDF.

I’m using CentOS and PHP.

My attempt:

I originally tried using a combination of pdftotext and imagemagick. I converted the PDF into an image and that actually separated the pages into their own images. Unfortunately the quality of the image on the page came out very poor.

My goal:

I need to split the PDF into multiple PDFs, one per page. Then, I need to extract the image from that page with the best quality possible.

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T13:41:12+00:00

imagemagick does not fit to perform this task

when you need to extract images from a pdf, at their original size (i.e. the best, since any other resolution is or lesser or bigger than original), you must to use

pdfimages

http://www.foolabs.com/xpdf/download.html

(static binaries are available if you cannot compile from source)

syntax:

pdfimages file.pdf image-root

the image resulting will have the extension .ppm , unless you add the switch -j to have jpeg images as output

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a multi-page PDF file that has information I need to parse. The

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply