I’m working with PDFs in VB.NET using a DLL I found on code project:
http://www.codeproject.com/Articles/37458/PDF-Viewer-Control-Without-Acrobat-Reader-Installe
My app allows you to select multiple files in a grid and print them. The files are stored in password-protected zip files, so the first step I do is extract each selected file to a memory stream that I pass to a new PDF wrapper object. Each object gets added to a queue. Then, each object in the queue is printed, page by page, as a system.drawing.image. The whole thing runs on a background worker.
Now, extracting the PDFs to the queue uses hardly any memory. But in the PrintPage event handler, when I extract the images and send them to the printer, something must be going wrong. My memory usage explodes. Each image, of course, is large because it’s rendered at 300 dpi, but the memory used by each page isn’t being returned to the OS and neither is it being garbage collected.
In the end, if I select enough files, I run out of memory. Why?
Ok, so I finally figured it out.
First, as far as the images go, the CLR apparently doesn’t know how much memory is allocated for a
Drawing.Imageso when you dispose it, you have to tell it:Now, the PDF library from the CodeProject sample was quite a bit more difficult.
First of all, make sure you call the
Disposemethod on thePDFWrapperobject in either theFormClosedevent of the form that holds the wrapper, or in theFinalizemethod of the class that holds it.But, the
PDFWrapperactually seems to cache the images you retrieve from it. So as you page through a PDF, memory usage will grow until the images for entire PDF are cached. This is an even bigger problem if you use those images to print the PDF at 300DPI (I get out of memory errors toward the end of a 60+ page PDF at 1.5GB of memory used).There is no ‘Clear Cache’ method for this object as far as I can tell. But the hack I used to get it working was to grab an image at 1DPI after I get the image I need, then perform garbage collection as above. This indirectly frees up the memory that was cached. However, like before, we must tell the CLR how many bytes we used. It’s the same calculation as above.
BUT there is one more problem. The
PDFWrapperobject is actually grabbing the images on another thread, it seems. So, by requesting another 1DPI image after we request the 300DPI image, it gets confused and randomly spits out 1DPI images when it should be giving us 300DPI images to print. So, the workaround for this:And there you go. Perhaps that’s why in the CodeProject sample, he uses a different DLL to do the printing. However, the
PDFWrapperobject supports reading from aIO.MemoryStream, I don’t think any of the other includes in that project do.Happy coding to anyone who reads this!