What is the best way to programmatically check if a PDF file is a

Question

0

Asked: May 13, 20262026-05-13T21:51:04+00:00 2026-05-13T21:51:04+00:00

What is the best way to programmatically check if a PDF file is a

0

What is the best way to programmatically check if a PDF file is a totally scanned one?
I do have iText and PDFBox at my disposal. I can check if a pdf file contains text or not, and according to the result to decide if this file is OCRed, but this solution is not 100% accurate. I’d like to know whether there is another way to cope with the problem.

As you understand the solution must be Java based.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T21:51:05+00:00

Editorial Team

2026-05-13T21:51:05+00:00Added an answer on May 13, 2026 at 9:51 pm

Your best bet might be to check to see if it has text and also see if it contains a large pagesized image or lots of tiled images which cover the page. If you also check the metadata this should cover most options.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

What is the best way to programmatically check if a PDF file is a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply