I am just starting to try and use iTextSharp for manipulating PDF documents. As a simple exercise i have tried to extract the text from a simple PDF using the below code.
protected void btnUpload_Click(object sender, EventArgs e)
{
if (fuPDFUpload.HasFile)
{
PdfReader reader = new PdfReader(fuPDFUpload.FileBytes);
for (int i = 0; i < reader.NumberOfPages; i++)
{
lblPdfText.Text += PdfTextExtractor.GetTextFromPage(reader, i);
}
}
}
The above code throws a null reference exception, reader is not null and i is obviously not null being an int, if reader was null i would expect an ArgumentNullException. reader has pages hence the fact it goes into the loop. I can only think this is some kind of bug. It is open source so i could try and fix it but i really don’t have the time. Does anyone know what might be going on here or how i might work around it?
OK so PDFs do not have a page 0, the below code works fine:
That is a very unhelpful exception, you would think there was some kind of check that would throw a more helpful exception, maybe i shall submit a patch when i have time.