I’m facing a problem when I try to read the content of a PDF

Question

0

Asked: May 25, 20262026-05-25T15:06:26+00:00 2026-05-25T15:06:26+00:00

I’m facing a problem when I try to read the content of a PDF

0

I’m facing a problem when I try to read the content of a PDF document. I’m using iText 2.1.7 with Java, and I need to analyze the content of a PDF document: at first I was using the PdfTextExtractor‘s getTextFromPage method and it was working right, but only when the page is just text, if it contains an image, then the String that I get with the getTextFromPage is a set of meaningless symbols (maybe a different character encoding?), and I lose the content of the whole page. I tried with the last version of iText and works fine, but if I’m not wrong the license wouldn’t be totally free (I’m working in a web application for a commercial customer, which serves PDFs on the fly) so I can’t use it. I would really appreciate if you have any suggestion.

In case you need it, here is the code:

PdfReader pdf = new PdfReader(doc);  //doc is just a byte[]
int pageCount = pdf.getNumberOfPages();
for (int i = 1; i <= pageCount; i++) {
    PdfTextExtractor pdfTextExtractor = new PdfTextExtractor(pdf);
    String pageText = pdfTextExtractor.getTextFromPage(i);

Thanks in advance, regards.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T15:06:27+00:00

Editorial Team

2026-05-25T15:06:27+00:00Added an answer on May 25, 2026 at 3:06 pm

I think that you PDF has an inline image. I do not think that iText 2.1.7 will deal with that.
You can find information regarding the license here

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m facing a problem when I try to read the content of a PDF

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply