I am getting a zip file as an InputStream. I am then separating each file inside it. Then I am passing the same byte array to a pdfbox which internally uses Apace pdf box 1.6.0 to convert it to image.
However when I pass the byte array to the PDFDocumentReader I get the following exception-
SEVERE: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@44c2beb9
java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@44c2beb9
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:530)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:862)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:829)
at org.dopdf.document.read.pdf.PDFDocumentReader.init(PDFDocumentReader.java:98)
To fetch each file from the zip I use the following code –
ZipInputStream zis = new ZipInputStream(aZipFile); // aZipFile is byte array
ZipEntry entry;
ArrayList<String> nameOfIgnoredFiles = new ArrayList<String>();
byte data[] = null;
while ((entry = zis.getNextEntry()) != null) {
if (entry.getName().endsWith(".pdf")) {
int dataSize = (int)entry.getSize();
data = new byte[dataSize];
zis.read(data);
// i use data and pass it to the pdf box.
} else {
nameOfIgnoredFiles.add(entry.getName());
}
The data byte array that I fetch above is then passed to like below –
PDFDocumentReader document = new PDFDocumentReader(data); // here i get the error
What am I doing wrong? Can you suggest a solution? I guess the fetching of the data byte array is an issue. How to do it the best way?
You are assuming that
zis.read(data)fills the buffer. Check the API documentation. It isn’t guaranteed to do that. You are also assuming that the size fits into an int, and that the item itself fits into memory. None of these assumptions is valid.Surely you can pass the entry’s
InputStreamto apdfboxAPI?