I am using this code to import different pdf files pages to a single document. When i import large files (200 pages or above) I am getting a OutOfMemory exception. Am i doing something wrong here?
private bool SaveToFile(string fileName)
{
try
{
iTextSharp.text.Document doc;
iTextSharp.text.pdf.PdfCopy pdfCpy;
string output = fileName;
doc = new iTextSharp.text.Document();
pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(output, System.IO.FileMode.Create));
doc.Open();
foreach (DataGridViewRow item in dvSourcePreview.Rows)
{
string pdfFileName = item.Cells[COL_FILENAME].Value.ToString();
int pdfPageIndex = int.Parse(item.Cells[COL_PAGE_NO].Value.ToString());
pdfPageIndex += 1;
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(pdfFileName);
int pageCount = reader.NumberOfPages;
// set page size for the documents
doc.SetPageSize(reader.GetPageSizeWithRotation(1));
iTextSharp.text.pdf.PdfImportedPage page = pdfCpy.GetImportedPage(reader, pdfPageIndex);
pdfCpy.AddPage(page);
reader.Close();
}
doc.Close();
return true;
}
catch (Exception ex)
{
return false;
}
}
You’re creating a new
PdfReaderfor each pass. That’s horribly inefficient. And because you’ve got aPdfImportedPagefrom each one, all those (probably redundant)PdfReaderinstances are never GC’ed.Suggestions:
PdfReader“open” at a time. UsePdfCopy.freeReader()when you’re done with a given reader. This will almost certainly change the order in which your pages are added (maybe a Very Bad Thing).PdfReaderinstances based on the file name. FreeReader again when you’re done… but you probably won’t be able to free any of them until you’ve dropped out of your loop. The caching alone may be enough to keep you from running out of memory.freeReader()after you close a givenPdfReaderinstance.