I am getting a zip file as an InputStream . I am then separating

Question

0

Asked: June 5, 20262026-06-05T13:48:52+00:00 2026-06-05T13:48:52+00:00

I am getting a zip file as an InputStream . I am then separating

0

I am getting a zip file as an InputStream. I am then separating each file inside it. Then I am passing the same byte array to a pdfbox which internally uses Apace pdf box 1.6.0 to convert it to image.

However when I pass the byte array to the PDFDocumentReader I get the following exception-

SEVERE: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@44c2beb9
java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@44c2beb9
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:530)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:862)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:829)
at org.dopdf.document.read.pdf.PDFDocumentReader.init(PDFDocumentReader.java:98)

To fetch each file from the zip I use the following code –

    ZipInputStream zis = new ZipInputStream(aZipFile); // aZipFile is byte array
    ZipEntry entry;
    ArrayList<String> nameOfIgnoredFiles = new ArrayList<String>();
    byte data[] = null;
    while ((entry = zis.getNextEntry()) != null) {
        if (entry.getName().endsWith(".pdf")) {
            int dataSize = (int)entry.getSize();
            data = new byte[dataSize];
            zis.read(data);
            // i use data and pass it to the pdf box.
        } else {
            nameOfIgnoredFiles.add(entry.getName());
        }

The data byte array that I fetch above is then passed to like below –

PDFDocumentReader document = new PDFDocumentReader(data); // here i get the error

What am I doing wrong? Can you suggest a solution? I guess the fetching of the data byte array is an issue. How to do it the best way?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T13:48:53+00:00

Editorial Team

2026-06-05T13:48:53+00:00Added an answer on June 5, 2026 at 1:48 pm

You are assuming that zis.read(data) fills the buffer. Check the API documentation. It isn’t guaranteed to do that. You are also assuming that the size fits into an int, and that the item itself fits into memory. None of these assumptions is valid.

Surely you can pass the entry’s InputStream to a pdfbox API?

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am getting a zip file as an InputStream . I am then separating

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply